Research

Information-Theoretic Vector Search Is Having Its Moment — And That's a Good Thing

May 22, 2026•12 min read•Dr. Majid Fekri

A lot of people have reached out this past couple of weeks asking about Google Research's TurboQuant paper and what it means for information-theoretic approaches to vector quantization and search. I want to address it directly, because the conversation matters — both for where the field is heading and for what we've been building at Moorcheh for the last two years.

First, the honest reaction: I'm genuinely thrilled to see TurboQuant going viral. When a Google Research team publishes a paper arguing that information-theoretic vector quantization can match or beat the dominant product-quantization stack, it's a validating moment for everyone who has been making this argument. The thesis we've been advancing — that Shannon's source coding framework is the right foundation for the next generation of vector search, not heuristic k-means codebooks bolted onto HNSW graphs — is now reaching mainstream awareness. That's a win for the field.

So let me say what I think TurboQuant gets right, where credit is due upstream of it, and what the broader research community is still catching up to.

What TurboQuant Actually Shows

TurboQuant is a careful, well-executed paper. Its central contributions are worth stating precisely:

A data-oblivious, online MSE-optimal quantizer built by randomly rotating input vectors (inducing a Beta distribution per coordinate), then applying optimal Lloyd–Max scalar quantization to each coordinate.
A two-stage construction for unbiased inner-product estimation: MSE-optimal quantization at bit-width b − 1, followed by a 1-bit Quantized JL transform on the residual.
Formal information-theoretic lower bounds via Shannon's lower bound and Yao's minimax principle, showing the method lands within a factor of √3π/2 ≈ 2.7 of optimal across all bit-widths.

That last point is the real theoretical contribution. Getting provable, near-optimal distortion rates with a data-oblivious algorithm that runs well on accelerators is genuinely useful — particularly for KV-cache quantization, which is the application the paper leans into hardest.

Paper: TurboQuant (Google Research)

Credit Where It's Due: RaBitQ Got There First on the Hard Parts

Before anyone declares TurboQuant a breakthrough, it's worth pointing out that RaBitQ (Gao & Long, SIGMOD 2024) preceded it on most of the foundational theoretical machinery. RaBitQ established:

The use of randomly rotated bi-valued vectors as a quantization codebook with explicit geometric interpretation.
A sharp, asymptotically optimal O(1/√D) error bound on inner-product estimation — provably tight per Alon & Klartag's lower bound.
An unbiased estimator with a closed-form confidence interval that can drive error-bounded re-ranking without parameter tuning.
Efficient bitwise distance computation (Hamming-style) on the compact codes.

TurboQuant extends some of this in useful directions — particularly the explicit MSE-versus-inner-product decomposition and the QJL residual stage — but the architectural shift toward randomized binarization with theoretical guarantees was largely RaBitQ's. The research community should give that paper its due, and so should we.

Where Compression Stops and Architecture Begins

Here is the part I want to be direct about: TurboQuant and RaBitQ both address compression. That is one piece of the puzzle. It is not the whole puzzle.

If you stop at "compress vectors well and search them quickly," you can interpret the result in an infinite number of ways at the implementation layer. You can bolt it onto HNSW. You can pair it with whatever reranker you already use. You can deploy it on the same in-memory ANN infrastructure that defines today's vector database market. The compression gets better; the architecture stays the same; the operational cost model stays the same.

That is not what we built.

Moorcheh's core methodology and IP combines three components that operate as an integrated whole:

Maximally Informative Binarization (MIB) — our information-theoretic compression layer, conceptually related to what TurboQuant and RaBitQ explore, but tuned to preserve semantic content rather than purely minimize geometric distortion.
Efficient Distance Metric (EDM) — bitwise operations on the binarized codes, exploiting native CPU instructions to replace the floating-point arithmetic that dominates cosine similarity.
Information-Theoretic Score (ITS) — and this is the part the academic papers don't touch. ITS is a ranking mechanism that replaces geometric proximity as the primary relevance signal. It also functions as an integrated reranker, which is why our end-to-end latency on the MAIR benchmark beats two-stage architectures (vector search + external reranker) by roughly 6.6× on average.

In our own benchmarking across 14 MAIR datasets and 10,038 queries, MIB + EDM + ITS achieved NDCG@10 quality comparable to full-precision floating-point systems while maintaining constant throughput across all precision levels — something HNSW-based systems structurally cannot do, because their throughput-precision tradeoff is baked into the graph traversal model.

Compression alone cannot get you there. Compression plus the right scoring function plus the right execution model can.

Paper: Moorcheh: Information-Theoretic Semantic Search

The Architectural Dividend

The reason this matters operationally — and the reason we've spent two years building the surrounding system rather than just publishing a compression paper — is that information-theoretic binarization unlocks a deployment model that is structurally impossible for HNSW-based architectures.

Moorcheh is a four-layer serverless architecture that replaces the entire enterprise AI retrieval stack: vector database clusters, reranking APIs, observability middleware, and the engineering overhead to maintain them. Every layer deploys into your VPC as coordinated cloud-native services. Every layer scales independently. Every layer costs $0 when idle.

Sovereign deployment — your VPC, on AWS, GCP, or Azure.
Information-theoretic engine — MIB compression and ITS scoring, the science behind the cost reduction.
Ingestion and scale — multimodal, 10M+ documents, async pipelines.
Agentic memory and tooling — which brings me to our latest release.

Memanto: Persistent Memory for Autonomous Agents

Two weeks ago we launched Memanto, our companion memory agent that helps other AI agents manage their memories — across sessions, across tools, across time. Memanto is built on top of the Moorcheh engine and inherits its deterministic retrieval guarantees, which turns out to matter enormously for agent stability.

ANN-based retrieval introduces non-determinism that compounds across multi-turn agent interactions; deterministic retrieval eliminates that source of instability entirely. If you're building autonomous agents and have been frustrated by retrieval-induced behavioral drift, Memanto is worth a look.

Closing Thought

I expect we'll see more papers in this direction. TurboQuant won't be the last. The research community is converging on what we identified two years ago: information-theoretic compression is not a trick, it's the foundation. Each new paper that arrives at this conclusion strengthens the case for the architectural shift we've already shipped.

We'll keep innovating. We'll keep publishing. We'll keep building the four-layer system that turns the theory into deployable infrastructure. And we'll keep welcoming the rest of the field as it catches up — because a rising tide on information-theoretic search lifts every serious team working in this space, ours included.

For more on Moorcheh and Memanto, visit moorcheh.ai and memanto.ai.

Build this architecture today.

Get your API key and start building agentic memory in under 5 minutes.

Get API Key