Storing
Concept
LlamaIndex provides a high-level interface for ingesting, indexing, and querying your external data.
Under the hood, LlamaIndex also supports swappable storage components that allows you to customize:
Document stores: where ingested documents (i.e.,
Node
objects) are stored,Index stores: where index metadata are stored,
Vector stores: where embedding vectors are stored.
Graph stores: where knowledge graphs are stored (i.e. for
KnowledgeGraphIndex
).
The Document/Index stores rely on a common Key-Value store abstraction, which is also detailed below.
LlamaIndex supports persisting data to any storage backend supported by fsspec. We have confirmed support for the following storage backends:
Local filesystem
AWS S3
Cloudflare R2
Usage Pattern
Many vector stores (except FAISS) will store both the data as well as the index (embeddings). This means that you will not need to use a separate document store or index store. This also means that you will not need to explicitly persist this data - this happens automatically. Usage would look something like the following to build a new index / reload an existing one.
## build a new index
from llama_index import VectorStoreIndex, StorageContext
from llama_index.vector_stores import DeepLakeVectorStore
# construct vector store and customize storage context
vector_store = DeepLakeVectorStore(dataset_path="<dataset_path>")
storage_context = StorageContext.from_defaults(vector_store=vector_store)
# Load documents and build index
index = VectorStoreIndex.from_documents(
documents, storage_context=storage_context
)
## reload an existing one
index = VectorStoreIndex.from_vector_store(vector_store=vector_store)
See our Vector Store Module Guide below for more details.
Note that in general to use storage abstractions, you need to define a StorageContext
object:
from llama_index.storage.docstore import SimpleDocumentStore
from llama_index.storage.index_store import SimpleIndexStore
from llama_index.vector_stores import SimpleVectorStore
from llama_index.storage import StorageContext
# create storage context using default stores
storage_context = StorageContext.from_defaults(
docstore=SimpleDocumentStore(),
vector_store=SimpleVectorStore(),
index_store=SimpleIndexStore(),
)
More details on customization/persistence can be found in the guides below.
Modules
We offer in-depth guides on the different storage components.