Simple Vector Stores - Maximum Marginal Relevance Retrieval#

This notebook explores the use of MMR retrieval [1]. By using maximum marginal relevance, one can iteratively find documents that are dissimilar to previous results. It has been shown to improve performance for LLM retrievals [2].

The maximum marginal relevance algorithm is as follows: $$ \text{{MMR}} = \arg\max_{d_i \in D \setminus R} [ \lambda \cdot Sim_1(d_i, q) - (1 - \lambda) \cdot \max_{d_j \in R} Sim_2(d_i, d_j) ] $$

Here, D is the set of all candidate documents, R is the set of already selected documents, q is the query, $Sim_1$ is the similarity function between a document and the query, and $Sim_2$ is the similarity function between two documents. $d_i$ and $d_j$ are documents in D and R respectively.

The parameter λ (mmr_threshold) controls the trade-off between relevance (the first term) and diversity (the second term). If mmr_threshold is close to 1, more emphasis is put on relevance, while a mmr_threshold close to 0 puts more emphasis on diversity.

Download Data

!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

from llama_index import VectorStoreIndex, SimpleDirectoryReader

# llama_index/docs/examples/data/paul_graham
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
index = VectorStoreIndex.from_documents(documents)

# To use mmr, set it as a vector_store_query_mode
query_engine = index.as_query_engine(vector_store_query_mode="mmr")
response = query_engine.query("What did the author do growing up?")
print(response)

The author grew up writing essays on topics they had stacked up, exploring other things they could work on, and learning Italian. They lived in Florence, Italy and experienced the city at street level in all conditions. They also studied art and painting, and became familiar with the signature style seekers at RISD. They later moved to Cambridge, Massachusetts and got an apartment that was rent-stabilized. They worked on software, including a code editor and an online store builder, and wrote essays about their experiences. They also founded Y Combinator, a startup accelerator, and created the Summer Founders Program to give undergrads an alternative to working at tech companies.

from llama_index import VectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
index = VectorStoreIndex.from_documents(documents)

# To set the threshold, set it in vector_store_kwargs
query_engine_with_threshold = index.as_query_engine(
    vector_store_query_mode="mmr", vector_store_kwargs={"mmr_threshold": 0.2}
)

response = query_engine_with_threshold.query(
    "What did the author do growing up?"
)
print(response)

The author grew up writing essays on topics they had stacked up, exploring other things they could work on, and learning Italian. They lived in Florence, Italy and experienced the city at street level in all conditions. They also studied art and painting, and became familiar with the signature style seekers at RISD. They later moved to Cambridge, Massachusetts and got an apartment that was rent-stabilized. They worked on software, including a code editor and an online store builder, and wrote essays about their experiences. They also founded Y Combinator, a startup accelerator, and developed the batch model of funding startups.

Note that the node score will be scaled with the threshold and will additionally be penalized for the similarity to previous nodes. As the threshold goes to 1, the scores will become equal and similarity to previous nodes will be ignored, turning off the impact of MMR. By lowering the threshold, the algorithm will prefer more diverse documents.

index1 = VectorStoreIndex.from_documents(documents)
query_engine_no_mrr = index1.as_query_engine()
response_no_mmr = query_engine_no_mrr.query(
    "What did the author do growing up?"
)

index2 = VectorStoreIndex.from_documents(documents)
query_engine_with_high_threshold = index2.as_query_engine(
    vector_store_query_mode="mmr", vector_store_kwargs={"mmr_threshold": 0.8}
)
response_low_threshold = query_engine_with_low_threshold.query(
    "What did the author do growing up?"
)

index3 = VectorStoreIndex.from_documents(documents)
query_engine_with_low_threshold = index3.as_query_engine(
    vector_store_query_mode="mmr", vector_store_kwargs={"mmr_threshold": 0.2}
)
response_high_threshold = query_engine_with_high_threshold.query(
    "What did the author do growing up?"
)

print(
    "Scores without MMR ",
    [node.score for node in response_no_mmr.source_nodes],
)
print(
    "Scores with MMR and a threshold of 0.8 ",
    [node.score for node in response_high_threshold.source_nodes],
)
print(
    "Scores with MMR and a threshold of 0.2 ",
    [node.score for node in response_low_threshold.source_nodes],
)

Scores without MMR  [0.8139363671956625, 0.8110763805571549]
Scores with MMR and a threshold of 0.8  [0.6511610127407832, 0.4716293734403398]
Scores with MMR and a threshold of 0.2  [0.16278861260228436, -0.4745776806511904]

Retrieval-Only Demonstration#

By setting a small chunk size and adjusting the “mmr_threshold” parameter, we can see how the retrieved results change from very diverse (and less relevant) to less diverse (and more relevant/redundant).

We try the following values: 0.1, 0.5, 0.8, 1.0

from llama_index import (
    VectorStoreIndex,
    SimpleDirectoryReader,
    ServiceContext,
)
from llama_index.response.notebook_utils import display_source_node
from llama_index.llms import OpenAI

llm = OpenAI(temperature=0, model="gpt-3.5-turbo")
service_context = ServiceContext.from_defaults(llm=llm, chunk_size_limit=64)

# llama_index/docs/examples/data/paul_graham
documents = SimpleDirectoryReader("../data/paul_graham/").load_data()
index = VectorStoreIndex.from_documents(
    documents, service_context=service_context
)

retriever = index.as_retriever(
    vector_store_query_mode="mmr",
    similarity_top_k=3,
    vector_store_kwargs={"mmr_threshold": 0.1},
)
nodes = retriever.retrieve(
    "What did the author do during his time in Y Combinator?"
)

for n in nodes:
    display_source_node(n, source_length=1000)

Document ID: 40d925c0-67fb-47eb-84f7-51728b224a6d
Similarity: 0.08476292699394482
Text: initial set of customers almost entirely from among their batchmates.

I had not originally intended YC to be a full-time job. I was going to do three things: hack, write essays, and work on YC. As YC grew, and I grew more excited…

Document ID: 72651e88-62cc-4d99-baf8-222c05b5e129
Similarity: -0.5616228896922558
Text: and because I painted them on leftover scraps of canvas, which was all I could afford at the time. Painting still lives is different from painting people, because the subject, as its name suggests, can’t move. People can’t sit for more than about 15 minutes at…

Document ID: 0328e711-c8f7-4a91-a0c1-a372068e3f1c
Similarity: -0.5230344987656315
Text: alternative to the Turing machine. If you want to write an interpreter for a language in itself, what’s the minimum set of predefined operators you need? The Lisp that John McCarthy invented, or more accurately discovered, is an answer to that question….

retriever = index.as_retriever(
    vector_store_query_mode="mmr",
    similarity_top_k=3,
    vector_store_kwargs={"mmr_threshold": 0.5},
)
nodes = retriever.retrieve(
    "What did the author do during his time in Y Combinator?"
)

for n in nodes:
    display_source_node(n, source_length=1000)

Document ID: 40d925c0-67fb-47eb-84f7-51728b224a6d
Similarity: 0.42381204797542626
Text: initial set of customers almost entirely from among their batchmates.

I had not originally intended YC to be a full-time job. I was going to do three things: hack, write essays, and work on YC. As YC grew, and I grew more excited…

Document ID: 0328e711-c8f7-4a91-a0c1-a372068e3f1c
Similarity: 0.018193356482163803
Text: alternative to the Turing machine. If you want to write an interpreter for a language in itself, what’s the minimum set of predefined operators you need? The Lisp that John McCarthy invented, or more accurately discovered, is an answer to that question….

Document ID: fbefd791-308a-4438-b6ec-353c2f05867b
Similarity: 0.05669398537137432
Text: and partly because I was focused on my mother, whose cancer had returned.

She died on January 15, 2014. We knew this was coming, but it was still hard when it did.

I kept working on YC till March, to help get that batch of startups through…

retriever = index.as_retriever(
    vector_store_query_mode="mmr",
    similarity_top_k=3,
    vector_store_kwargs={"mmr_threshold": 0.8},
)
nodes = retriever.retrieve(
    "What did the author do during his time in Y Combinator?"
)

for n in nodes:
    display_source_node(n, source_length=1000)

Document ID: 40d925c0-67fb-47eb-84f7-51728b224a6d
Similarity: 0.6781190611335854
Text: initial set of customers almost entirely from among their batchmates.

I had not originally intended YC to be a full-time job. I was going to do three things: hack, write essays, and work on YC. As YC grew, and I grew more excited…

Document ID: 7a8189bc-ccb6-402d-8ce5-49587b13878e
Similarity: 0.49504062407907184
Text: next several years I wrote lots of essays about all kinds of different topics. O’Reilly reprinted a collection of them as a book, called Hackers & Painters after one of the essays in it. I also worked on spam filters, and did some more painting….

Document ID: 3ed4c422-a297-40b9-9510-68cc8f18e2c9
Similarity: 0.5017248860360811
Text: Y Combinator was not the original name. At first we were called Cambridge Seed. But we didn’t want a regional name, in case someone copied us in Silicon Valley, so we renamed ourselves after one of the coolest tricks in the lambda calculus, the Y…

retriever = index.as_retriever(
    vector_store_query_mode="mmr",
    similarity_top_k=3,
    vector_store_kwargs={"mmr_threshold": 1.0},
)
nodes = retriever.retrieve(
    "What did the author do during his time in Y Combinator?"
)

for n in nodes:
    display_source_node(n, source_length=1000)

Document ID: 40d925c0-67fb-47eb-84f7-51728b224a6d
Similarity: 0.8476240959508525
Text: initial set of customers almost entirely from among their batchmates.

I had not originally intended YC to be a full-time job. I was going to do three things: hack, write essays, and work on YC. As YC grew, and I grew more excited…

Document ID: 1a8b0250-9b62-418c-a1df-6af4454a77e7
Similarity: 0.8252174449518838
Text: already helped write the RSS spec and would a few years later become a martyr for open access, and Sam Altman, who would later become the second president of YC. I don’t think it was entirely luck that the first batch was so good. You had to be pretty bold…

Document ID: 7d571ed4-0f23-41cd-a2fd-8a590c9e8f11
Similarity: 0.8227484107217059
Text: announcement on my site, inviting undergrads to apply. I had never imagined that writing essays would be a way to get “deal flow,” as investors call it, but it turned out to be the perfect source. [15] We got 225 applications for the Summer Founders…