RAG Bootcamp ◦ February 2024 ◦ Vector Institute¶
In [ ]:
Copied!
##################################################################
# Venue: RAG Bootcamp - Vector Institute Canada
# Talk: RAG Bootcamp: Intro to RAG with the LlamaIndexFramework
# Speaker: Andrei Fajardo
##################################################################
##################################################################
# Venue: RAG Bootcamp - Vector Institute Canada
# Talk: RAG Bootcamp: Intro to RAG with the LlamaIndexFramework
# Speaker: Andrei Fajardo
##################################################################
Notebook Setup & Dependency Installation¶
In [ ]:
Copied!
%pip install llama-index llama-index-vector-stores-qdrant -q
%pip install llama-index llama-index-vector-stores-qdrant -q
In [ ]:
Copied!
import nest_asyncio
nest_asyncio.apply()
import nest_asyncio
nest_asyncio.apply()
In [ ]:
Copied!
!mkdir data
!wget "https://arxiv.org/pdf/2402.09353.pdf" -O "./data/dorav1.pdf"
!mkdir data
!wget "https://arxiv.org/pdf/2402.09353.pdf" -O "./data/dorav1.pdf"
Motivation¶
In [ ]:
Copied!
# query an LLM and ask it about DoRA
from llama_index.llms.openai import OpenAI
llm = OpenAI(model="gpt-4")
response = llm.complete("What is DoRA?")
# query an LLM and ask it about DoRA
from llama_index.llms.openai import OpenAI
llm = OpenAI(model="gpt-4")
response = llm.complete("What is DoRA?")
In [ ]:
Copied!
print(response.text)
print(response.text)
Without specific context, it's hard to determine what DoRA refers to as it could mean different things in different fields. However, in general, DoRA could refer to: 1. Division of Research and Analysis: In some organizations, this is a department responsible for conducting research and analyzing data. 2. Department of Regulatory Agencies: In some U.S. states, this is a government agency responsible for consumer protection and regulation of businesses. 3. Declaration of Research Assessment: In academia, this could refer to a statement or policy regarding how research is evaluated. 4. Digital On-Ramp's Assessment: In the field of digital technology, this could refer to an assessment tool used by the Digital On-Ramps program. Please provide more context for a more accurate definition.
Basic RAG in 3 Steps¶
- Build external knowledge (i.e., updated data sources)
- Retrieve
- Augment and Generate
1. Build External Knowledge¶
In [ ]:
Copied!
"""Load the data.
With llama-index, before any transformations are applied,
data is loaded in the `Document` abstraction, which is
a container that holds the text of the document.
"""
from llama_index.core import SimpleDirectoryReader
loader = SimpleDirectoryReader(input_dir="./data")
documents = loader.load_data()
"""Load the data.
With llama-index, before any transformations are applied,
data is loaded in the `Document` abstraction, which is
a container that holds the text of the document.
"""
from llama_index.core import SimpleDirectoryReader
loader = SimpleDirectoryReader(input_dir="./data")
documents = loader.load_data()
In [ ]:
Copied!
# if you want to see what the text looks like
# documents[0].text
# if you want to see what the text looks like
# documents[0].text
In [ ]:
Copied!
"""Chunk, Encode, and Store into a Vector Store.
To streamline the process, we can make use of the IngestionPipeline
class that will apply your specified transformations to the
Document's.
"""
from llama_index.core.ingestion import IngestionPipeline
from llama_index.core.node_parser import SentenceSplitter
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.vector_stores.qdrant import QdrantVectorStore
import qdrant_client
client = qdrant_client.QdrantClient(location=":memory:")
vector_store = QdrantVectorStore(client=client, collection_name="test_store")
pipeline = IngestionPipeline(
transformations=[
SentenceSplitter(),
OpenAIEmbedding(),
],
vector_store=vector_store,
)
_nodes = pipeline.run(documents=documents, num_workers=4)
"""Chunk, Encode, and Store into a Vector Store.
To streamline the process, we can make use of the IngestionPipeline
class that will apply your specified transformations to the
Document's.
"""
from llama_index.core.ingestion import IngestionPipeline
from llama_index.core.node_parser import SentenceSplitter
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.vector_stores.qdrant import QdrantVectorStore
import qdrant_client
client = qdrant_client.QdrantClient(location=":memory:")
vector_store = QdrantVectorStore(client=client, collection_name="test_store")
pipeline = IngestionPipeline(
transformations=[
SentenceSplitter(),
OpenAIEmbedding(),
],
vector_store=vector_store,
)
_nodes = pipeline.run(documents=documents, num_workers=4)
In [ ]:
Copied!
# if you want to see the nodes
# len(_nodes)
# _nodes[0].text
# if you want to see the nodes
# len(_nodes)
# _nodes[0].text
In [ ]:
Copied!
"""Create a llama-index... wait for it... Index.
After uploading your encoded documents into your vector
store of choice, you can connect to it with a VectorStoreIndex
which then gives you access to all of the llama-index functionality.
"""
from llama_index.core import VectorStoreIndex
index = VectorStoreIndex.from_vector_store(vector_store=vector_store)
"""Create a llama-index... wait for it... Index.
After uploading your encoded documents into your vector
store of choice, you can connect to it with a VectorStoreIndex
which then gives you access to all of the llama-index functionality.
"""
from llama_index.core import VectorStoreIndex
index = VectorStoreIndex.from_vector_store(vector_store=vector_store)
2. Retrieve Against A Query¶
In [ ]:
Copied!
"""Retrieve relevant documents against a query.
With our Index ready, we can now query it to
retrieve the most relevant document chunks.
"""
retriever = index.as_retriever(similarity_top_k=2)
retrieved_nodes = retriever.retrieve("What is DoRA?")
"""Retrieve relevant documents against a query.
With our Index ready, we can now query it to
retrieve the most relevant document chunks.
"""
retriever = index.as_retriever(similarity_top_k=2)
retrieved_nodes = retriever.retrieve("What is DoRA?")
In [ ]:
Copied!
# to view the retrieved node
# print(retrieved_nodes[0].text)
# to view the retrieved node
# print(retrieved_nodes[0].text)
3. Generate Final Response¶
In [ ]:
Copied!
"""Context-Augemented Generation.
With our Index ready, we can create a QueryEngine
that handles the retrieval and context augmentation
in order to get the final response.
"""
query_engine = index.as_query_engine()
"""Context-Augemented Generation.
With our Index ready, we can create a QueryEngine
that handles the retrieval and context augmentation
in order to get the final response.
"""
query_engine = index.as_query_engine()
In [ ]:
Copied!
# to inspect the default prompt being used
print(
query_engine.get_prompts()[
"response_synthesizer:text_qa_template"
].default_template.template
)
# to inspect the default prompt being used
print(
query_engine.get_prompts()[
"response_synthesizer:text_qa_template"
].default_template.template
)
Context information is below. --------------------- {context_str} --------------------- Given the context information and not prior knowledge, answer the query. Query: {query_str} Answer:
In [ ]:
Copied!
response = query_engine.query("What is DoRA?")
print(response)
response = query_engine.query("What is DoRA?")
print(response)
DoRA is a method that introduces incremental directional updates in a model by replacing them with alternative LoRA variants. It is compatible with other LoRA variants such as VeRA, which suggests freezing a unique pair of random low-rank matrices shared across all layers and employing minimal layer-specific trainable scaling vectors to capture each layer's incremental updates. DoRA effectively reduces the number of trainable parameters significantly while maintaining accuracy, showcasing improvements over other variants like VeRA and LoRA.
In Summary¶
- LLMs as powerful as they are, don't perform too well with knowledge-intensive tasks (domain-specific, updated data, long-tail)
- Context augmentation has been shown (in a few studies) to outperform LLMs without augmentation
- In this notebook, we showed one such example that follows that pattern.
LlamaIndex Has More To Offer¶
- Data infrastructure that enables production-grade, advanced RAG systems
- Agentic solutions
- Newly released:
llama-index-networks
- Enterprise offerings (alpha):
- LlamaParse (proprietary complex PDF parser) and
- LlamaCloud