Document Stores#
Document stores contain ingested document chunks, which we call Node
objects.
See the API Reference for more details.
Simple Document Store#
By default, the SimpleDocumentStore
stores Node
objects in-memory.
They can be persisted to (and loaded from) disk by calling docstore.persist()
(and SimpleDocumentStore.from_persist_path(...)
respectively).
A more complete example can be found here
MongoDB Document Store#
We support MongoDB as an alternative document store backend that persists data as Node
objects are ingested.
from llama_index.storage.docstore.mongodb import MongoDocumentStore
from llama_index.core.node_parser import SentenceSplitter
# create parser and parse document into nodes
parser = SentenceSplitter()
nodes = parser.get_nodes_from_documents(documents)
# create (or load) docstore and add nodes
docstore = MongoDocumentStore.from_uri(uri="<mongodb+srv://...>")
docstore.add_documents(nodes)
# create storage context
storage_context = StorageContext.from_defaults(docstore=docstore)
# build index
index = VectorStoreIndex(nodes, storage_context=storage_context)
Under the hood, MongoDocumentStore
connects to a fixed MongoDB database and initializes new collections (or loads existing collections) for your nodes.
Note: You can configure the
db_name
andnamespace
when instantiatingMongoDocumentStore
, otherwise they default todb_name="db_docstore"
andnamespace="docstore"
.
Note that it's not necessary to call storage_context.persist()
(or docstore.persist()
) when using an MongoDocumentStore
since data is persisted by default.
You can easily reconnect to your MongoDB collection and reload the index by re-initializing a MongoDocumentStore
with an existing db_name
and collection_name
.
A more complete example can be found here
Redis Document Store#
We support Redis as an alternative document store backend that persists data as Node
objects are ingested.
from llama_index.storage.docstore.redis import RedisDocumentStore
from llama_index.core.node_parser import SentenceSplitter
# create parser and parse document into nodes
parser = SentenceSplitter()
nodes = parser.get_nodes_from_documents(documents)
# create (or load) docstore and add nodes
docstore = RedisDocumentStore.from_host_and_port(
host="127.0.0.1", port="6379", namespace="llama_index"
)
docstore.add_documents(nodes)
# create storage context
storage_context = StorageContext.from_defaults(docstore=docstore)
# build index
index = VectorStoreIndex(nodes, storage_context=storage_context)
Under the hood, RedisDocumentStore
connects to a redis database and adds your nodes to a namespace stored under {namespace}/docs
.
Note: You can configure the
namespace
when instantiatingRedisDocumentStore
, otherwise it defaultsnamespace="docstore"
.
You can easily reconnect to your Redis client and reload the index by re-initializing a RedisDocumentStore
with an existing host
, port
, and namespace
.
A more complete example can be found here
Firestore Document Store#
We support Firestore as an alternative document store backend that persists data as Node
objects are ingested.
from llama_index.storage.docstore.firestore import FirestoreDocumentStore
from llama_index.core.node_parser import SentenceSplitter
# create parser and parse document into nodes
parser = SentenceSplitter()
nodes = parser.get_nodes_from_documents(documents)
# create (or load) docstore and add nodes
docstore = FirestoreDocumentStore.from_database(
project="project-id",
database="(default)",
)
docstore.add_documents(nodes)
# create storage context
storage_context = StorageContext.from_defaults(docstore=docstore)
# build index
index = VectorStoreIndex(nodes, storage_context=storage_context)
Under the hood, FirestoreDocumentStore
connects to a firestore database in Google Cloud and adds your nodes to a namespace stored under {namespace}/docs
.
Note: You can configure the
namespace
when instantiatingFirestoreDocumentStore
, otherwise it defaultsnamespace="docstore"
.
You can easily reconnect to your Firestore database and reload the index by re-initializing a FirestoreDocumentStore
with an existing project
, database
, and namespace
.
A more complete example can be found here