Usage Pattern
Get Started
Build an index from documents:
from llama_index import VectorStoreIndex
index = VectorStoreIndex.from_documents(docs)
Tip
To learn how to load documents, see data connectors
What is happening under the hood?
Documents are chunked up and parsed into
Node
objects (which are lightweight abstractions over text str that additionally keep track of metadata and relationships).Additional computation is performed to add
Node
into index data structureNote: the computation is index-specific.
For a vector store index, this means calling an embedding model (via API or locally) to compute embedding for the
Node
objectsFor a document summary index, this means calling an LLM to generate a summary
Configuring Document Parsing
The most common configuration you might want to change is how to parse document into Node
objects.
High-Level API
We can configure our service context to use the desired chunk size and set show_progress
to display a progress bar during index construction.
from llama_index import ServiceContext, VectorStoreIndex
service_context = ServiceContext.from_defaults(chunk_size=512)
index = VectorStoreIndex.from_documents(
docs, service_context=service_context, show_progress=True
)
Note: While the high-level API optimizes for ease-of-use, it does NOT expose full range of configurability.
Low-Level API
You can use the low-level composition API if you need more granular control.
Here we show an example where you want to both modify the text chunk size, disable injecting metadata, and disable creating Node
relationships.
The steps are:
Configure a node parser
from llama_index.node_parser import SentenceSplitter
parser = SentenceSplitter(
chunk_size=512,
include_extra_info=False,
include_prev_next_rel=False,
)
Parse document into
Node
objects
nodes = parser.get_nodes_from_documents(documents)
build index from
Node
objects
index = VectorStoreIndex(nodes)
Handling Document Update
Read more about how to deal with data sources that change over time with Index
insertion, deletion, update, and refresh operations.