Storing#
Concept#
LlamaIndex provides a high-level interface for ingesting, indexing, and querying your external data.
Under the hood, LlamaIndex also supports swappable storage components that allows you to customize:
- Document stores: where ingested documents (i.e.,
Node
objects) are stored, - Index stores: where index metadata are stored,
- Vector stores: where embedding vectors are stored.
- Graph stores: where knowledge graphs are stored (i.e. for
KnowledgeGraphIndex
). - Chat Stores: where chat messages are stored and organized.
The Document/Index stores rely on a common Key-Value store abstraction, which is also detailed below.
LlamaIndex supports persisting data to any storage backend supported by fsspec. We have confirmed support for the following storage backends:
- Local filesystem
- AWS S3
- Cloudflare R2
Usage Pattern#
Many vector stores (except FAISS) will store both the data as well as the index (embeddings). This means that you will not need to use a separate document store or index store. This also means that you will not need to explicitly persist this data - this happens automatically. Usage would look something like the following to build a new index / reload an existing one.
## build a new index
from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.vector_stores.deeplake import DeepLakeVectorStore
# construct vector store and customize storage context
vector_store = DeepLakeVectorStore(dataset_path="<dataset_path>")
storage_context = StorageContext.from_defaults(vector_store=vector_store)
# Load documents and build index
index = VectorStoreIndex.from_documents(
documents, storage_context=storage_context
)
## reload an existing one
index = VectorStoreIndex.from_vector_store(vector_store=vector_store)
See our Vector Store Module Guide below for more details.
Note that in general to use storage abstractions, you need to define a StorageContext
object:
from llama_index.core.storage.docstore import SimpleDocumentStore
from llama_index.core.storage.index_store import SimpleIndexStore
from llama_index.core.vector_stores import SimpleVectorStore
from llama_index.core import StorageContext
# create storage context using default stores
storage_context = StorageContext.from_defaults(
docstore=SimpleDocumentStore(),
vector_store=SimpleVectorStore(),
index_store=SimpleIndexStore(),
)
More details on customization/persistence can be found in the guides below.
Modules#
We offer in-depth guides on the different storage components.