MilvusOperatorFunctionDemo
How to use FilterOperatorFunctions for advanced scalar querying and complex query joins in Milvus¶
The goal of this guide is to walk through the basics of how to utilize the LlamaIndex FilterOperatorFunctions to leverage the power of Milvus's advanced query cabability against hosted vector databases. For context on how these work, see Milvus's documentation:
This guide assumes a few things:
- You have a provisioned Milvus collection loaded into and hosted on a vector database
- You are running this example locally and have access to environment variables
Install Milvus and LlamaIndex dependencies¶
%pip install llama-index-vector-stores-milvus
! pip install llama-index
Build reused code¶
- constants
- function to demonstrate outputs
from llama_index.core.schema import QueryBundle
top_k = 5
key = "product_codes"
def retrieve_and_print_results(retriever):
query_result = retriever.retrieve(
QueryBundle(
query_str="Explain non-refoulement.", embedding=[0.0] * 3072
)
)
for node in query_result:
print(
f"node id_: {node.id_}\nmetadata: \n\tchapter id: {node.metadata['chapter_id']}\n\t{key}: {node.metadata[key]}\n"
)
Load .env variables and build the VectorStore/Index¶
Provide the path to the variables if necessary (i.e. if running in a forked local repository)
- If you'd rather provide the uri, token and collection info manually, do that in the next step and ignore the load_dotenv
from dotenv import load_dotenv
load_dotenv("/path/to/your/.env")
import os
from llama_index.vector_stores.milvus import MilvusVectorStore
from llama_index.core import VectorStoreIndex
vector_store = MilvusVectorStore(
overwrite=False,
uri=os.getenv("MILVUS_URI", "xxx"),
token=os.getenv("MILVUS_TOKEN", "yyy"),
collection_name=os.getenv("MILVUS_COLLECTION", "zzz"),
)
index = VectorStoreIndex.from_vector_store(vector_store=vector_store)
Run Queries¶
Using a FilterOperatorFunction¶
Assume that there is a metadata field called "product_codes" that contains an array of strings detailing certain product information. To filter the vector results down to only those tagged with "code4", use the ARRAY_CONTAINS
function
Build the ScalarMetadataFilter
and ScalarMetadataFilters
objects
from llama_index.vector_stores.milvus.utils import (
ScalarMetadataFilters,
ScalarMetadataFilter,
FilterOperatorFunction,
)
array_contains_scalar_filter = ScalarMetadataFilter(
key=key, value="code4", operator=FilterOperatorFunction.ARRAY_CONTAINS
)
scalar_filters = ScalarMetadataFilters(filters=[array_contains_scalar_filter])
retriever = index.as_retriever(
vector_store_kwargs={"milvus_scalar_filters": scalar_filters.to_dict()},
similarity_top_k=top_k,
)
retrieve_and_print_results(retriever)
Execute the query and print the relevant information¶
ARRAY_CONTAINS(product_codes, "code4")
Example output:
- Only contains nodes with metadata that matches the ARRAY_CONTAINS restriction
node id_: c_142236555_s_291254779-291254817
metadata:
chapter id: 142236555
product_codes: ['code2', 'code9', 'code5', 'code4', 'code6']
node id_: c_440696406_s_440696822-440696847
metadata:
chapter id: 440696406
product_codes: ['code3', 'code2', 'code1', 'code4', 'code9', 'code5']
node id_: c_440700190_s_440700206-440700218
metadata:
chapter id: 440700190
product_codes: ['code9', 'code7', 'code4', 'code2', 'code6']
node id_: c_440763876_s_440763935-440763942
metadata:
chapter id: 440763876
product_codes: ['code4', 'code8', 'code10']
node id_: c_440885466_s_440885620-440885631
metadata:
chapter id: 440885466
product_codes: ['code9', 'code5', 'code2', 'code4', 'code1']
Run a query using the FilterOperator.NIN enum to exclude some previous results¶
chapter_id not in [440885466, 440763876]
from llama_index.core.vector_stores import (
MetadataFilters,
MetadataFilter,
FilterOperator,
)
not_in_metadata_filter = MetadataFilter(
key="chapter_id", value=[440885466, 440763876], operator=FilterOperator.NIN
)
metadata_filters = MetadataFilters(filters=[not_in_metadata_filter])
retriever = index.as_retriever(
filters=metadata_filters, similarity_top_k=top_k
)
retrieve_and_print_results(retriever)
Example output:
- Doesn't contain chapter ids 440885466 or 440763876
- Contains results with product codes we would've excluded in the first query
node id_: c_440769025_s_440769040-440769053
metadata:
chapter id: 440769025
product_codes: ['code3']
node id_: c_441155692_s_441155856-441155752
metadata:
chapter id: 441155692
product_codes: ['code9', 'code1']
node id_: c_142236555_s_291254779-291254817
metadata:
chapter id: 142236555
product_codes: ['code2', 'code9', 'code5', 'code4', 'code6']
node id_: c_441156096_s_441156098-441156102
metadata:
chapter id: 441156096
product_codes: ['code3', 'code8', 'code5']
node id_: c_444354779_s_444354787-444354792
metadata:
chapter id: 444354779
product_codes: ['code3', 'code5', 'code10', 'code1']
Combine the two query conditions into a single query call¶
ARRAY_CONTAINS(product_codes, "code4") and chapter_id not in [440885466, 440763876]
retriever = index.as_retriever(
filters=metadata_filters,
vector_store_kwargs={"milvus_scalar_filters": scalar_filters.to_dict()},
similarity_top_k=top_k,
)
retrieve_and_print_results(retriever)
Example output:
- Doesn't contain chapter ids 440885466 or 440763876
- Only contains results that match the ARRAY_CONTAINS restriction
node id_: c_142236555_s_291254779-291254817
metadata:
chapter id: 142236555
product_codes['code2', 'code9', 'code5', 'code4', 'code6']
node id_: c_361386932_s_361386982-361387025
metadata:
chapter id: 361386932
product_codes['code4']
node id_: c_361386932_s_361387000-361387179
metadata:
chapter id: 361386932
product_codes['code4']
node id_: c_361386932_s_361387026-361387053
metadata:
chapter id: 361386932
product_codes['code4']
node id_: c_361384286_s_361384359-361384367
metadata:
chapter id: 361384286
product_codes['code4', 'code2', 'code9']