Document Stores

Document stores contain ingested document chunks, which we call Node objects.

See the API Reference for more details.

Simple Document Store

By default, the SimpleDocumentStore stores Node objects in-memory. They can be persisted to (and loaded from) disk by calling docstore.persist() (and SimpleDocumentStore.from_persist_path(...) respectively).

A more complete example can be found here

MongoDB Document Store

We support MongoDB as an alternative document store backend that persists data as Node objects are ingested.

from llama_index.storage.docstore import MongoDocumentStore
from llama_index.node_parser import SentenceSplitter

# create parser and parse document into nodes
parser = SentenceSplitter()
nodes = parser.get_nodes_from_documents(documents)

# create (or load) docstore and add nodes
docstore = MongoDocumentStore.from_uri(uri="<mongodb+srv://...>")
docstore.add_documents(nodes)

# create storage context
storage_context = StorageContext.from_defaults(docstore=docstore)

# build index
index = VectorStoreIndex(nodes, storage_context=storage_context)

Under the hood, MongoDocumentStore connects to a fixed MongoDB database and initializes new collections (or loads existing collections) for your nodes.

Note: You can configure the db_name and namespace when instantiating MongoDocumentStore, otherwise they default to db_name="db_docstore" and namespace="docstore".

Note that it’s not necessary to call storage_context.persist() (or docstore.persist()) when using an MongoDocumentStore since data is persisted by default.

You can easily reconnect to your MongoDB collection and reload the index by re-initializing a MongoDocumentStore with an existing db_name and collection_name.

A more complete example can be found here

Redis Document Store

We support Redis as an alternative document store backend that persists data as Node objects are ingested.

from llama_index.storage.docstore import RedisDocumentStore
from llama_index.node_parser import SentenceSplitter

# create parser and parse document into nodes
parser = SentenceSplitter()
nodes = parser.get_nodes_from_documents(documents)

# create (or load) docstore and add nodes
docstore = RedisDocumentStore.from_host_and_port(
    host="127.0.0.1", port="6379", namespace="llama_index"
)
docstore.add_documents(nodes)

# create storage context
storage_context = StorageContext.from_defaults(docstore=docstore)

# build index
index = VectorStoreIndex(nodes, storage_context=storage_context)

Under the hood, RedisDocumentStore connects to a redis database and adds your nodes to a namespace stored under {namespace}/docs.

Note: You can configure the namespace when instantiating RedisDocumentStore, otherwise it defaults namespace="docstore".

You can easily reconnect to your Redis client and reload the index by re-initializing a RedisDocumentStore with an existing host, port, and namespace.

A more complete example can be found here

Firestore Document Store

We support Firestore as an alternative document store backend that persists data as Node objects are ingested.

from llama_index.storage.docstore import FirestoreDocumentStore
from llama_index.node_parser import SentenceSplitter

# create parser and parse document into nodes
parser = SentenceSplitter()
nodes = parser.get_nodes_from_documents(documents)

# create (or load) docstore and add nodes
docstore = FirestoreDocumentStore.from_dataabse(
    project="project-id",
    database="(default)",
)
docstore.add_documents(nodes)

# create storage context
storage_context = StorageContext.from_defaults(docstore=docstore)

# build index
index = VectorStoreIndex(nodes, storage_context=storage_context)

Under the hood, FirestoreDocumentStore connects to a firestore database in Google Cloud and adds your nodes to a namespace stored under {namespace}/docs.

Note: You can configure the namespace when instantiating FirestoreDocumentStore, otherwise it defaults namespace="docstore".

You can easily reconnect to your Firestore database and reload the index by re-initializing a FirestoreDocumentStore with an existing project, database, and namespace.

A more complete example can be found here