CassandraVectorStore#
- class llama_index.vector_stores.CassandraVectorStore(table: str, embedding_dimension: int, *, session: Optional[Any] = None, keyspace: Optional[str] = None, ttl_seconds: Optional[int] = None, insertion_batch_size: int = 20)#
Bases:
VectorStore
Cassandra Vector Store.
An abstraction of a Cassandra table with vector-similarity-search. Documents, and their embeddings, are stored in a Cassandra table and a vector-capable index is used for searches. The table does not need to exist beforehand: if necessary it will be created behind the scenes.
All Cassandra operations are done through the CassIO library.
Note: in recent versions, only table and embedding_dimension can be passed positionally. Please revise your code if needed. This is to accommodate for a leaner usage, whereby the DB connection is set globally through a cassio.init(…) call: then, the DB details are not to be specified anymore when creating a vector store, unless desired.
- Parameters
table (str) – table name to use. If not existing, it will be created.
embedding_dimension (int) – length of the embedding vectors in use.
session (optional, cassandra.cluster.Session) – the Cassandra session to use. Can be omitted, or equivalently set to None, to use the DB connection set globally through cassio.init() beforehand.
keyspace (optional. str) – name of the Cassandra keyspace to work in Can be omitted, or equivalently set to None, to use the DB connection set globally through cassio.init() beforehand.
ttl_seconds (optional, int) – expiration time for inserted entries. Default is no expiration (None).
insertion_batch_size (optional, int) – how many vectors are inserted concurrently, for use by bulk inserts. Defaults to 20.
Attributes Summary
Return the underlying cassIO vector table object.
Methods Summary
add
(nodes, **add_kwargs)Add nodes to index.
delete
(ref_doc_id, **delete_kwargs)Delete nodes using with ref_doc_id.
query
(query, **kwargs)Query index for top k most similar nodes.
Attributes Documentation
- client#
Return the underlying cassIO vector table object.
- flat_metadata: bool = True#
- stores_text: bool = True#
Methods Documentation
- add(nodes: List[BaseNode], **add_kwargs: Any) List[str] #
Add nodes to index.
- Parameters
nodes – List[BaseNode]: list of node with embeddings
- delete(ref_doc_id: str, **delete_kwargs: Any) None #
Delete nodes using with ref_doc_id.
- Parameters
ref_doc_id (str) – The doc_id of the document to delete.
- query(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult #
Query index for top k most similar nodes.
Supported query modes: ‘default’ (most similar vectors) and ‘mmr’.
- Parameters
query (VectorStoreQuery) –
the basic query definition. Defines: mode (VectorStoreQueryMode): one of the supported modes query_embedding (List[float]): query embedding to search against similarity_top_k (int): top k most similar nodes mmr_threshold (Optional[float]): this is the 0-to-1 MMR lambda.
If present, takes precedence over the kwargs parameter. Ignored unless for MMR queries.
- Args for query.mode == ‘mmr’ (ignored otherwise):
- mmr_threshold (Optional[float]): this is the 0-to-1 lambda for MMR.
Note that in principle mmr_threshold could come in the query
- mmr_prefetch_factor (Optional[float]): factor applied to top_k
for prefetch pool size. Defaults to 4.0
- mmr_prefetch_k (Optional[int]): prefetch pool size. This cannot be
passed together with mmr_prefetch_factor