Making multi-vector retrieval as fast as single-vector search

June 25, 2025

Neural embedding models have become a cornerstone of modern information retrieval (IR). Given a query from a user (e.g., “How tall is Mt Everest?”), the goal of IR is to find information relevant to the query from a very large collection of data (e.g., the billions of documents, images, or videos on the Web). Embedding models transform each datapoint into a single-vector “embedding”, such that semantically similar datapoints are transformed into mathematically similar vectors. The embeddings are generally compared via the inner-product similarity, enabling efficient retrieval through optimized maximum inner product search (MIPS) algorithms. However, recent advances, particularly the introduction of multi-vector models like ColBERT, have demonstrated significantly improved performance in IR tasks.

Unlike single-vector embeddings, multi-vector models represent each data point with a set of embeddings, and leverage more sophisticated similarity functions that can capture richer relationships between datapoints. For example, the popular Chamfer similarity measure used in state-of-the-art multi-vector models captures when the information in one multi-vector embedding is contained within another multi-vector embedding. While this multi-vector approach boosts accuracy and enables retrieving more relevant documents, it introduces substantial computational challenges. In particular, the increased number of embeddings and the complexity of multi-vector similarity scoring make retrieval significantly more expensive.

In “MUVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings”, we introduce a novel multi-vector retrieval algorithm designed to bridge the efficiency gap between single- and multi-vector retrieval. We transform multi-vector retrieval into a simpler problem by constructing fixed dimensional encodings (FDEs) of queries and documents, which are single vectors whose inner product approximates multi-vector similarity, thus reducing complex multi-vector retrieval back to single-vector maximum inner product search (MIPS). This new approach allows us to leverage the highly-optimized MIPS algorithms to retrieve an initial set of candidates that can then be re-ranked with the exact multi-vector similarity, thereby enabling efficient multi-vector retrieval without sacrificing accuracy. We have provided an open-source implementation of our FDE construction algorithm on GitHub.