Colbert Rerank#
If you’re opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.
Colbert: ColBERT is a fast and accurate retrieval model, enabling scalable BERT-based search over large text collections in tens of milliseconds.
This example shows how we use Colbert-V2 model as a reranker.
!pip install llama-index
!pip install llama-index-core
!pip install --quiet transformers torch
!pip install llama-index-embeddings-openai
!pip install llama-index-llms-openai
!pip install llama-index-postprocessor-colbert-rerank
from llama_index.core import (
VectorStoreIndex,
SimpleDirectoryReader,
)
Download Data
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
import os
os.environ["OPENAI_API_KEY"] = "sk-"
# load documents
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
# build index
index = VectorStoreIndex.from_documents(documents=documents)
Retrieve top 10 most relevant nodes, then filter with Colbert Rerank#
from llama_index.postprocessor.colbert_rerank import ColbertRerank
colbert_reranker = ColbertRerank(
top_n=5,
model="colbert-ir/colbertv2.0",
tokenizer="colbert-ir/colbertv2.0",
keep_retrieval_score=True,
)
query_engine = index.as_query_engine(
similarity_top_k=10,
node_postprocessors=[colbert_reranker],
)
response = query_engine.query(
"What did Sam Altman do in this essay?",
)
for node in response.source_nodes:
print(node.id_)
print(node.node.get_content()[:120])
print("reranking score: ", node.score)
print("retrieval score: ", node.node.metadata["retrieval_score"])
print("**********")
50157136-f221-4468-83e1-44e289f44cd5
When I was dealing with some urgent problem during YC, there was about a 60% chance it had to do with HN, and a 40% chan
reranking score: 0.6470144987106323
retrieval score: 0.8309200279065135
**********
87f0d691-b631-4b21-8123-8f71d383046b
Now that I could write essays again, I wrote a bunch about topics I'd had stacked up. I kept writing essays through 2020
reranking score: 0.6377773284912109
retrieval score: 0.8053000783543145
**********
10234ad9-46b1-4be5-8034-92392ac242ed
It's not that unprestigious types of work are good per se. But when you find yourself drawn to some kind of work despite
reranking score: 0.6301894187927246
retrieval score: 0.7975032272825491
**********
bc269bc4-49c7-4804-8575-cd6db47d70b8
It was as weird as it sounds. I resumed all my old patterns, except now there were doors where there hadn't been. Now wh
reranking score: 0.6282549500465393
retrieval score: 0.8026253284729862
**********
ebd7e351-64fc-4627-8ddd-2681d1ac33f8
As Jessica and I were walking home from dinner on March 11, at the corner of Garden and Walker streets, these three thre
reranking score: 0.6245909929275513
retrieval score: 0.7965812262372882
**********
print(response)
Sam Altman became the second president of Y Combinator after Paul Graham decided to step back from running the organization.
response = query_engine.query(
"Which schools did Paul attend?",
)
for node in response.source_nodes:
print(node.id_)
print(node.node.get_content()[:120])
print("reranking score: ", node.score)
print("retrieval score: ", node.node.metadata["retrieval_score"])
print("**********")
6942863e-dfc5-4a99-b642-967b99b71343
I didn't want to drop out of grad school, but how else was I going to get out? I remember when my friend Robert Morris g
reranking score: 0.6333063840866089
retrieval score: 0.7964996889742813
**********
477c5de0-8e05-494e-95cc-e221881fb5c1
What I Worked On
February 2021
Before college the two main things I worked on, outside of school, were writing and pro
reranking score: 0.5930159091949463
retrieval score: 0.7771872700578062
**********
0448df5c-7950-483d-bc63-15e9110da3bc
[15] We got 225 applications for the Summer Founders Program, and we were surprised to find that a lot of them were from
reranking score: 0.5160146951675415
retrieval score: 0.7782554326959897
**********
83af8efd-e992-4fd3-ada4-3c4c6f9971a1
Much to my surprise, the time I spent working on this stuff was not wasted after all. After we started Y Combinator, I w
reranking score: 0.5005874633789062
retrieval score: 0.7800375923908894
**********
bc269bc4-49c7-4804-8575-cd6db47d70b8
It was as weird as it sounds. I resumed all my old patterns, except now there were doors where there hadn't been. Now wh
reranking score: 0.4977223873138428
retrieval score: 0.782688582042514
**********
print(response)
Paul attended Cornell University for his graduate studies and later applied to RISD (Rhode Island School of Design) in the US.