RAG Bootcamp ◦ February 2024 ◦ Vector Institute¶

In [ ]:

Copied!





##################################################################
# Venue: RAG Bootcamp - Vector Institute Canada
# Talk: RAG Bootcamp: Intro to RAG with the LlamaIndexFramework
# Speaker: Andrei Fajardo
##################################################################
##################################################################
# Venue: RAG Bootcamp - Vector Institute Canada
# Talk: RAG Bootcamp: Intro to RAG with the LlamaIndexFramework
# Speaker: Andrei Fajardo
##################################################################

Title Image

Notebook Setup & Dependency Installation¶

In [ ]:

Copied!

%pip install llama-index llama-index-vector-stores-qdrant -q
%pip install llama-index llama-index-vector-stores-qdrant -q

In [ ]:

Copied!

import nest_asyncio

nest_asyncio.apply()
import nest_asyncio

nest_asyncio.apply()

In [ ]:

Copied!

!mkdir data
!wget "https://arxiv.org/pdf/2402.09353.pdf" -O "./data/dorav1.pdf"
!mkdir data
!wget "https://arxiv.org/pdf/2402.09353.pdf" -O "./data/dorav1.pdf"

Motivation¶

Motivation Image

In [ ]:

Copied!

# query an LLM and ask it about DoRA
from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-4")
response = llm.complete("What is DoRA?")
# query an LLM and ask it about DoRA
from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-4")
response = llm.complete("What is DoRA?")

In [ ]:

Copied!

print(response.text)
print(response.text)

Without specific context, it's hard to determine what DoRA refers to as it could mean different things in different fields. However, in general, DoRA could refer to:

1. Division of Research and Analysis: In some organizations, this is a department responsible for conducting research and analyzing data.

2. Department of Regulatory Agencies: In some U.S. states, this is a government agency responsible for consumer protection and regulation of businesses.

3. Declaration of Research Assessment: In academia, this could refer to a statement or policy regarding how research is evaluated.

4. Digital On-Ramp's Assessment: In the field of digital technology, this could refer to an assessment tool used by the Digital On-Ramps program.

Please provide more context for a more accurate definition.

Basic RAG in 3 Steps¶

Divider Image

Build external knowledge (i.e., updated data sources)
Retrieve
Augment and Generate

1. Build External Knowledge¶

Divider Image

In [ ]:

Copied!





"""Load the data.

With llama-index, before any transformations are applied,
data is loaded in the `Document` abstraction, which is
a container that holds the text of the document.
"""

from llama_index.core import SimpleDirectoryReader

loader = SimpleDirectoryReader(input_dir="./data")
documents = loader.load_data()
"""Load the data.

With llama-index, before any transformations are applied,
data is loaded in the `Document` abstraction, which is
a container that holds the text of the document.
"""

from llama_index.core import SimpleDirectoryReader

loader = SimpleDirectoryReader(input_dir="./data")
documents = loader.load_data()

In [ ]:

Copied!

# if you want to see what the text looks like
# documents[0].text
# if you want to see what the text looks like
# documents[0].text

In [ ]:

Copied!





"""Chunk, Encode, and Store into a Vector Store.

To streamline the process, we can make use of the IngestionPipeline
class that will apply your specified transformations to the
Document's.
"""

from llama_index.core.ingestion import IngestionPipeline
from llama_index.core.node_parser import SentenceSplitter
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.vector_stores.qdrant import QdrantVectorStore
import qdrant_client

client = qdrant_client.QdrantClient(location=":memory:")
vector_store = QdrantVectorStore(client=client, collection_name="test_store")

pipeline = IngestionPipeline(
    transformations=[
        SentenceSplitter(),
        OpenAIEmbedding(),
    ],
    vector_store=vector_store,
)
_nodes = pipeline.run(documents=documents, num_workers=4)
"""Chunk, Encode, and Store into a Vector Store.

To streamline the process, we can make use of the IngestionPipeline
class that will apply your specified transformations to the
Document's.
"""

from llama_index.core.ingestion import IngestionPipeline
from llama_index.core.node_parser import SentenceSplitter
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.vector_stores.qdrant import QdrantVectorStore
import qdrant_client

client = qdrant_client.QdrantClient(location=":memory:")
vector_store = QdrantVectorStore(client=client, collection_name="test_store")

pipeline = IngestionPipeline(
    transformations=[
        SentenceSplitter(),
        OpenAIEmbedding(),
    ],
    vector_store=vector_store,
)
_nodes = pipeline.run(documents=documents, num_workers=4)

In [ ]:

Copied!

# if you want to see the nodes
# len(_nodes)
# _nodes[0].text
# if you want to see the nodes
# len(_nodes)
# _nodes[0].text

In [ ]:

Copied!

"""Create a llama-index... wait for it... Index.

After uploading your encoded documents into your vector
store of choice, you can connect to it with a VectorStoreIndex
which then gives you access to all of the llama-index functionality.
"""

from llama_index.core import VectorStoreIndex

index = VectorStoreIndex.from_vector_store(vector_store=vector_store)
"""Create a llama-index... wait for it... Index.

After uploading your encoded documents into your vector
store of choice, you can connect to it with a VectorStoreIndex
which then gives you access to all of the llama-index functionality.
"""

from llama_index.core import VectorStoreIndex

index = VectorStoreIndex.from_vector_store(vector_store=vector_store)

2. Retrieve Against A Query¶

Step2 Image

In [ ]:

Copied!





"""Retrieve relevant documents against a query.

With our Index ready, we can now query it to
retrieve the most relevant document chunks.
"""

retriever = index.as_retriever(similarity_top_k=2)
retrieved_nodes = retriever.retrieve("What is DoRA?")
"""Retrieve relevant documents against a query.

With our Index ready, we can now query it to
retrieve the most relevant document chunks.
"""

retriever = index.as_retriever(similarity_top_k=2)
retrieved_nodes = retriever.retrieve("What is DoRA?")

In [ ]:

Copied!

# to view the retrieved node
# print(retrieved_nodes[0].text)
# to view the retrieved node
# print(retrieved_nodes[0].text)

3. Generate Final Response¶

Step3 Image

In [ ]:

Copied!





"""Context-Augemented Generation.

With our Index ready, we can create a QueryEngine
that handles the retrieval and context augmentation
in order to get the final response.
"""

query_engine = index.as_query_engine()
"""Context-Augemented Generation.

With our Index ready, we can create a QueryEngine
that handles the retrieval and context augmentation
in order to get the final response.
"""

query_engine = index.as_query_engine()

In [ ]:

Copied!





# to inspect the default prompt being used
print(
    query_engine.get_prompts()[
        "response_synthesizer:text_qa_template"
    ].default_template.template
)
# to inspect the default prompt being used
print(
    query_engine.get_prompts()[
        "response_synthesizer:text_qa_template"
    ].default_template.template
)

Context information is below.
---------------------
{context_str}
---------------------
Given the context information and not prior knowledge, answer the query.
Query: {query_str}
Answer:

In [ ]:

Copied!

response = query_engine.query("What is DoRA?")
print(response)
response = query_engine.query("What is DoRA?")
print(response)

DoRA is a method that introduces incremental directional updates in a model by replacing them with alternative LoRA variants. It is compatible with other LoRA variants such as VeRA, which suggests freezing a unique pair of random low-rank matrices shared across all layers and employing minimal layer-specific trainable scaling vectors to capture each layer's incremental updates. DoRA effectively reduces the number of trainable parameters significantly while maintaining accuracy, showcasing improvements over other variants like VeRA and LoRA.

In Summary¶

LLMs as powerful as they are, don't perform too well with knowledge-intensive tasks (domain-specific, updated data, long-tail)
Context augmentation has been shown (in a few studies) to outperform LLMs without augmentation
In this notebook, we showed one such example that follows that pattern.

LlamaIndex Has More To Offer¶

Data infrastructure that enables production-grade, advanced RAG systems
Agentic solutions
Newly released: llama-index-networks
Enterprise offerings (alpha):
- LlamaParse (proprietary complex PDF parser) and
- LlamaCloud

Useful links¶

website ◦ llamahub ◦ github ◦ medium ◦ rag-bootcamp-poster