Jina Embeddings¶
If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.
%pip install llama-index-embeddings-jinaai
%pip install llama-index-llms-openai
!pip install llama-index
You may also need other packages that do not come direcly with llama-index
!pip install Pillow
For this example, you will need an API key which you can get from https://jina.ai/embeddings/
# Initilise with your api key
import os
jinaai_api_key = "YOUR_JINAAI_API_KEY"
os.environ["JINAAI_API_KEY"] = jinaai_api_key
Embed text and queries with Jina embedding models through JinaAI API¶
You can encode your text and your queries using the JinaEmbedding class
from llama_index.embeddings.jinaai import JinaEmbedding
embed_model = JinaEmbedding(
api_key=jinaai_api_key,
model="jina-embeddings-v2-base-en",
)
embeddings = embed_model.get_text_embedding("This is the text to embed")
print(len(embeddings))
print(embeddings[:5])
embeddings = embed_model.get_query_embedding("This is the query to embed")
print(len(embeddings))
print(embeddings[:5])
Embed in batches¶
You can also embed text in batches, the batch size can be controlled by setting the embed_batch_size
parameter (the default value will be 10 if not passed, and it should not be larger than 2048)
embed_model = JinaEmbedding(
api_key=jinaai_api_key,
model="jina-embeddings-v2-base-en",
embed_batch_size=16,
)
embeddings = embed_model.get_text_embedding_batch(
["This is the text to embed", "More text can be provided in a batch"]
)
print(len(embeddings))
print(embeddings[0][:5])
Let's build a RAG pipeline using Jina AI Embeddings¶
Download Data¶
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
Imports¶
import logging
import sys
logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI
from llama_index.core.response.notebook_utils import display_source_node
from IPython.display import Markdown, display
Load Data¶
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
Build index¶
your_openai_key = "YOUR_OPENAI_KEY"
llm = OpenAI(api_key=your_openai_key)
embed_model = JinaEmbedding(
api_key=jinaai_api_key,
model="jina-embeddings-v2-base-en",
embed_batch_size=16,
)
index = VectorStoreIndex.from_documents(
documents=documents, embed_model=embed_model
)
Build retriever¶
search_query_retriever = index.as_retriever()
search_query_retrieved_nodes = search_query_retriever.retrieve(
"What happened after the thesis?"
)
for n in search_query_retrieved_nodes:
display_source_node(n, source_length=2000)
Node ID: d8db9dfe-ab7a-4709-9863-877b68d2210d
Similarity: 0.7698250992788241
Text: There were some surplus Xerox Dandelions floating around the computer lab at one point. Anyone who wanted one to play around with could have one. I was briefly tempted, but they were so slow by present standards; what was the point? No one else wanted one either, so off they went. That was what happened to systems work.
I wanted not just to build things, but to build things that would last.
In this dissatisfied state I went in 1988 to visit Rich Draves at CMU, where he was in grad school. One day I went to visit the Carnegie Institute, where I'd spent a lot of time as a kid. While looking at a painting there I realized something that might seem obvious, but was a big surprise to me. There, right on the wall, was something you could make that would last. Paintings didn't become obsolete. Some of the best ones were hundreds of years old.
And moreover this was something you could make a living doing. Not as easily as you could by writing software, of course, but I thought if you were really industrious and lived really cheaply, it had to be possible to make enough to survive. And as an artist you could be truly independent. You wouldn't have a boss, or even need to get research funding.
I had always liked looking at paintings. Could I make them? I had no idea. I'd never imagined it was even possible. I knew intellectually that people made art — that it didn't just appear spontaneously — but it was as if the people who made it were a different species. They either lived long ago or were mysterious geniuses doing strange things in profiles in Life magazine. The idea of actually being able to make art, to put that verb before that noun, seemed almost miraculous.
That fall I started taking art classes at Harvard. Grad students could take classes in any department, and my advisor, Tom Cheatham, was very easy going. If he even knew about the strange classes I was taking, he never said anything.
So now I was in a PhD program in computer science, yet planning to be an...
Node ID: 848bb0f8-629c-4491-b539-85f3ff5b77a2
Similarity: 0.7679106002573213
Text: Our teacher, professor Ulivi, was a nice guy. He could see I worked hard, and gave me a good grade, which he wrote down in a sort of passport each student had. But the Accademia wasn't teaching me anything except Italian, and my money was running out, so at the end of the first year I went back to the US.
I wanted to go back to RISD, but I was now broke and RISD was very expensive, so I decided to get a job for a year and then return to RISD the next fall. I got one at a company called Interleaf, which made software for creating documents. You mean like Microsoft Word? Exactly. That was how I learned that low end software tends to eat high end software. But Interleaf still had a few years to live yet. [5]
Interleaf had done something pretty bold. Inspired by Emacs, they'd added a scripting language, and even made the scripting language a dialect of Lisp. Now they wanted a Lisp hacker to write things in it. This was the closest thing I've had to a normal job, and I hereby apologize to my boss and coworkers, because I was a bad employee. Their Lisp was the thinnest icing on a giant C cake, and since I didn't know C and didn't want to learn it, I never understood most of the software. Plus I was terribly irresponsible. This was back when a programming job meant showing up every day during certain working hours. That seemed unnatural to me, and on this point the rest of the world is coming around to my way of thinking, but at the time it caused a lot of friction. Toward the end of the year I spent much of my time surreptitiously working on On Lisp, which I had by this time gotten a contract to publish.
The good part was that I got paid huge amounts of money, especially by art student standards. In Florence, after paying my part of the rent, my budget for everything else had been $7 a day. Now I was getting paid more than 4 times that every hour, even when I was just sitting in a meeting. By living cheaply I not only managed to save enough to go back to RISD, but a...