Azure AI Search¶
Basic Example¶
In this notebook, we take a Paul Graham essay, split it into chunks, embed it using an Azure OpenAI embedding model, load it into an Azure AI Search index, and then query it.
If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.
!pip install llama-index
!pip install wget
%pip install llama-index-vector-stores-azureaisearch
%pip install azure-search-documents==11.5.1
%llama-index-embeddings-azure-openai
%llama-index-llms-azure-openai
import logging
import sys
from azure.core.credentials import AzureKeyCredential
from azure.search.documents import SearchClient
from azure.search.documents.indexes import SearchIndexClient
from IPython.display import Markdown, display
from llama_index.core import (
SimpleDirectoryReader,
StorageContext,
VectorStoreIndex,
)
from llama_index.core.settings import Settings
from llama_index.llms.azure_openai import AzureOpenAI
from llama_index.embeddings.azure_openai import AzureOpenAIEmbedding
from llama_index.vector_stores.azureaisearch import AzureAISearchVectorStore
from llama_index.vector_stores.azureaisearch import (
IndexManagement,
MetadataIndexFieldType,
)
Setup Azure OpenAI¶
aoai_api_key = "YOUR_AZURE_OPENAI_API_KEY"
aoai_endpoint = "YOUR_AZURE_OPENAI_ENDPOINT"
aoai_api_version = "2024-10-21"
llm = AzureOpenAI(
model="YOUR_AZURE_OPENAI_COMPLETION_MODEL_NAME",
deployment_name="YOUR_AZURE_OPENAI_COMPLETION_DEPLOYMENT_NAME",
api_key=aoai_api_key,
azure_endpoint=aoai_endpoint,
api_version=aoai_api_version,
)
# You need to deploy your own embedding model as well as your own chat completion model
embed_model = AzureOpenAIEmbedding(
model="YOUR_AZURE_OPENAI_EMBEDDING_MODEL_NAME",
deployment_name="YOUR_AZURE_OPENAI_EMBEDDING_DEPLOYMENT_NAME",
api_key=aoai_api_key,
azure_endpoint=aoai_endpoint,
api_version=aoai_api_version,
)
Setup Azure AI Search¶
search_service_api_key = "YOUR-AZURE-SEARCH-SERVICE-ADMIN-KEY"
search_service_endpoint = "YOUR-AZURE-SEARCH-SERVICE-ENDPOINT"
search_service_api_version = "2024-07-01"
credential = AzureKeyCredential(search_service_api_key)
# Index name to use
index_name = "llamaindex-vector-demo"
# Use index client to demonstrate creating an index
index_client = SearchIndexClient(
endpoint=search_service_endpoint,
credential=credential,
)
# Use search client to demonstration using existing index
search_client = SearchClient(
endpoint=search_service_endpoint,
index_name=index_name,
credential=credential,
)
Create Index (if it does not exist)¶
Demonstrates creating a vector index named "llamaindex-vector-demo" if one doesn't exist. The index has the following fields:
Field Name | OData Type |
---|---|
id | Edm.String |
chunk | Edm.String |
embedding | Collection(Edm.Single) |
metadata | Edm.String |
doc_id | Edm.String |
author | Edm.String |
theme | Edm.String |
director | Edm.String |
metadata_fields = {
"author": "author",
"theme": ("topic", MetadataIndexFieldType.STRING),
"director": "director",
}
vector_store = AzureAISearchVectorStore(
search_or_index_client=index_client,
filterable_metadata_field_keys=metadata_fields,
index_name=index_name,
index_management=IndexManagement.CREATE_IF_NOT_EXISTS,
id_field_key="id",
chunk_field_key="chunk",
embedding_field_key="embedding",
embedding_dimensionality=1536,
metadata_string_field_key="metadata",
doc_id_field_key="doc_id",
language_analyzer="en.lucene",
vector_algorithm_type="exhaustiveKnn",
# compression_type="binary" # Option to use "scalar" or "binary". NOTE: compression is only supported for HNSW
)
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
Loading documents¶
Load the documents stored in the data/paul_graham/
using the SimpleDirectoryReader
# Load documents
documents = SimpleDirectoryReader("../data/paul_graham/").load_data()
storage_context = StorageContext.from_defaults(vector_store=vector_store)
Settings.llm = llm
Settings.embed_model = embed_model
index = VectorStoreIndex.from_documents(
documents, storage_context=storage_context
)
# Query Data
query_engine = index.as_query_engine(similarity_top_k=3)
response = query_engine.query("What did the author do growing up?")
display(Markdown(f"<b>{response}</b>"))
The author focused on writing and programming outside of school, writing short stories and experimenting with programming on an IBM 1401 in 9th grade. Later, the author continued programming on microcomputers and eventually convinced their father to buy a TRS-80, where they started writing simple games and a word processor.
response = query_engine.query(
"What did the author learn?",
)
display(Markdown(f"<b>{response}</b>"))
The author learned about programming on early computers, the limitations of early AI, the importance of working on unprestigious things, and the significance of writing essays online.
Use Existing Index¶
index_name = "llamaindex-vector-demo"
metadata_fields = {
"author": "author",
"theme": ("topic", MetadataIndexFieldType.STRING),
"director": "director",
}
vector_store = AzureAISearchVectorStore(
search_or_index_client=search_client,
filterable_metadata_field_keys=metadata_fields,
index_management=IndexManagement.VALIDATE_INDEX,
id_field_key="id",
chunk_field_key="chunk",
embedding_field_key="embedding",
embedding_dimensionality=1536,
metadata_string_field_key="metadata",
doc_id_field_key="doc_id",
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(
[],
storage_context=storage_context,
)
query_engine = index.as_query_engine()
response = query_engine.query("What was a hard moment for the author?")
display(Markdown(f"<b>{response}</b>"))
The author experienced a difficult moment when his mother had a stroke caused by colon cancer, which ultimately led to her passing away.
response = query_engine.query("Who is the author?")
display(Markdown(f"<b>{response}</b>"))
Paul Graham
import time
query_engine = index.as_query_engine(streaming=True)
response = query_engine.query("What happened at interleaf?")
start_time = time.time()
token_count = 0
for token in response.response_gen:
print(token, end="")
token_count += 1
time_elapsed = time.time() - start_time
tokens_per_second = token_count / time_elapsed
print(f"\n\nStreamed output at {tokens_per_second} tokens/s")
The company added a scripting language inspired by Emacs, and made the scripting language a dialect of Lisp. Streamed output at 64.01633939770672 tokens/s
Adding a document to existing index¶
response = query_engine.query("What colour is the sky?")
display(Markdown(f"<b>{response}</b>"))
The color of the sky varies depending on factors such as the time of day, weather conditions, and location.
from llama_index.core import Document
index.insert_nodes([Document(text="The sky is indigo today")])
response = query_engine.query("What colour is the sky?")
display(Markdown(f"<b>{response}</b>"))
The color of the sky is indigo.
Filtering¶
from llama_index.core.schema import TextNode
nodes = [
TextNode(
text="The Shawshank Redemption",
metadata={
"author": "Stephen King",
"theme": "Friendship",
},
),
TextNode(
text="The Godfather",
metadata={
"director": "Francis Ford Coppola",
"theme": "Mafia",
},
),
TextNode(
text="Inception",
metadata={
"director": "Christopher Nolan",
},
),
]
index.insert_nodes(nodes)
from llama_index.core.vector_stores.types import (
MetadataFilters,
ExactMatchFilter,
)
filters = MetadataFilters(
filters=[ExactMatchFilter(key="theme", value="Mafia")]
)
retriever = index.as_retriever(filters=filters)
retriever.retrieve("What is inception about?")
[NodeWithScore(node=TextNode(id_='f0c299d8-1f59-4338-9c4e-06b99855ed23', embedding=None, metadata={'director': 'Francis Ford Coppola', 'theme': 'Mafia'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text='The Godfather', mimetype='text/plain', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), score=0.8120511)]
Query Mode¶
Four query modes are supported: DEFAULT (vector search), SPARSE, HYBRID, and SEMANTIC_HYBRID.
Perform a Vector Search¶
from llama_index.core.vector_stores.types import VectorStoreQueryMode
default_retriever = index.as_retriever(
vector_store_query_mode=VectorStoreQueryMode.DEFAULT
)
response = default_retriever.retrieve("What is inception about?")
# Loop through each NodeWithScore in the response
for node_with_score in response:
node = node_with_score.node # The TextNode object
score = node_with_score.score # The similarity score
chunk_id = node.id_ # The chunk ID
# Extract the relevant metadata from the node
file_name = node.metadata.get("file_name", "Unknown")
file_path = node.metadata.get("file_path", "Unknown")
# Extract the text content from the node
text_content = node.text if node.text else "No content available"
# Print the results in a user-friendly format
print(f"Score: {score}")
print(f"File Name: {file_name}")
print(f"Id: {chunk_id}")
print("\nExtracted Content:")
print(text_content)
print("\n" + "=" * 40 + " End of Result " + "=" * 40 + "\n")
Score: 0.87485534 File Name: Unknown Id: b4d2af4e-1de0-4cfe-8b18-629722bc12d7 Extracted Content: Inception ======================================== End of Result ======================================== Score: 0.8120511 File Name: Unknown Id: f0c299d8-1f59-4338-9c4e-06b99855ed23 Extracted Content: The Godfather ======================================== End of Result ========================================
Perform a Hybrid Search¶
from llama_index.core.vector_stores.types import VectorStoreQueryMode
hybrid_retriever = index.as_retriever(
vector_store_query_mode=VectorStoreQueryMode.HYBRID
)
hybrid_retriever.retrieve("What is inception about?")
[NodeWithScore(node=TextNode(id_='18de619a-94ca-46d9-bb17-51bd1b5c041a', embedding=None, metadata={'director': 'Christopher Nolan'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text='Inception', mimetype='text/plain', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), score=0.03333333507180214), NodeWithScore(node=TextNode(id_='226f09aa-b9f1-40eb-a377-35da6f288fa1', embedding=None, metadata={'file_path': 'c:\\Dev\\llama_index\\docs\\docs\\examples\\vector_stores\\..\\data\\paul_graham\\paul_graham_essay.txt', 'file_name': 'paul_graham_essay.txt', 'file_type': 'text/plain', 'file_size': 75395, 'creation_date': '2024-08-23', 'last_modified_date': '2024-08-23'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='ba4cac4e-640c-41de-b01d-2c0116f753c8', node_type=<ObjectType.DOCUMENT: '4'>, metadata={'file_path': 'c:\\Dev\\llama_index\\docs\\docs\\examples\\vector_stores\\..\\data\\paul_graham\\paul_graham_essay.txt', 'file_name': 'paul_graham_essay.txt', 'file_type': 'text/plain', 'file_size': 75395, 'creation_date': '2024-08-23', 'last_modified_date': '2024-08-23'}, hash='aec9c37ce12c52b02feb44993a53c84733fc31d3637224bda1c831b42009fc17'), <NodeRelationship.PREVIOUS: '2'>: RelatedNodeInfo(node_id='b4120ac2-8829-4253-87ac-a714c1e680a2', node_type=<ObjectType.TEXT: '1'>, metadata={'file_path': 'c:\\Dev\\llama_index\\docs\\docs\\examples\\vector_stores\\..\\data\\paul_graham\\paul_graham_essay.txt', 'file_name': 'paul_graham_essay.txt', 'file_type': 'text/plain', 'file_size': 75395, 'creation_date': '2024-08-23', 'last_modified_date': '2024-08-23'}, hash='65795efeb29d8b8af5b61ff4d7b5607dfb0d262cb13f1eec55bb2e3d30654b28'), <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='7bf9e654-5bbb-40a8-8cb5-12e3737a837d', node_type=<ObjectType.TEXT: '1'>, metadata={}, hash='2cca6e84108227c4e35fdd947606cebb212464fbe7dc51e4204ee27d62533e54')}, text='I recruited Dan Giffin, who had worked for Viaweb, and two undergrads who wanted summer jobs, and we got to work trying to build what it\'s now clear is about twenty companies and several open source projects worth of software. The language for defining applications would of course be a dialect of Lisp. But I wasn\'t so naive as to assume I could spring an overt Lisp on a general audience; we\'d hide the parentheses, like Dylan did.\r\n\r\nBy then there was a name for the kind of company Viaweb was, an "application service provider," or ASP. This name didn\'t last long before it was replaced by "software as a service," but it was current for long enough that I named this new company after it: it was going to be called Aspra.\r\n\r\nI started working on the application builder, Dan worked on network infrastructure, and the two undergrads worked on the first two services (images and phone calls). But about halfway through the summer I realized I really didn\'t want to run a company — especially not a big one, which it was looking like this would have to be. I\'d only started Viaweb because I needed the money. Now that I didn\'t need money anymore, why was I doing this? If this vision had to be realized as a company, then screw the vision. I\'d build a subset that could be done as an open source project.\r\n\r\nMuch to my surprise, the time I spent working on this stuff was not wasted after all. After we started Y Combinator, I would often encounter startups working on parts of this new architecture, and it was very useful to have spent so much time thinking about it and even trying to write some of it.\r\n\r\nThe subset I would build as an open source project was the new Lisp, whose parentheses I now wouldn\'t even have to hide. A lot of Lisp hackers dream of building a new Lisp, partly because one of the distinctive features of the language is that it has dialects, and partly, I think, because we have in our minds a Platonic form of Lisp that all existing dialects fall short of. I certainly did. So at the end of the summer Dan and I switched to working on this new dialect of Lisp, which I called Arc, in a house I bought in Cambridge.\r\n\r\nThe following spring, lightning struck. I was invited to give a talk at a Lisp conference, so I gave one about how we\'d used Lisp at Viaweb. Afterward I put a postscript file of this talk online, on paulgraham.com, which I\'d created years before using Viaweb but had never used for anything. In one day it got 30,000 page views. What on earth had happened? The referring urls showed that someone had posted it on Slashdot. [10]\r\n\r\nWow, I thought, there\'s an audience. If I write something and put it on the web, anyone can read it. That may seem obvious now, but it was surprising then. In the print era there was a narrow channel to readers, guarded by fierce monsters known as editors. The only way to get an audience for anything you wrote was to get it published as a book, or in a newspaper or magazine. Now anyone could publish anything.\r\n\r\nThis had been possible in principle since 1993, but not many people had realized it yet. I had been intimately involved with building the infrastructure of the web for most of that time, and a writer as well, and it had taken me 8 years to realize it. Even then it took me several years to understand the implications. It meant there would be a whole new generation of essays. [11]\r\n\r\nIn the print era, the channel for publishing essays had been vanishingly small. Except for a few officially anointed thinkers who went to the right parties in New York, the only people allowed to publish essays were specialists writing about their specialties. There were so many essays that had never been written, because there had been no way to publish them. Now they could be, and I was going to write them. [12]\r\n\r\nI\'ve worked on several different things, but to the extent there was a turning point where I figured out what to work on, it was when I started publishing essays online. From then on I knew that whatever else I did, I\'d always write essays too.\r\n\r\nI knew that online essays would be a marginal medium at first. Socially they\'d seem more like rants posted by nutjobs on their GeoCities sites than the genteel and beautifully typeset compositions published in The New Yorker. But by this point I knew enough to find that encouraging instead of discouraging.', mimetype='text/plain', start_char_idx=40977, end_char_idx=45334, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), score=0.016393441706895828)]
Perform a Hybrid Search with Semantic Reranking¶
This mode incorporates semantic reranking to hybrid search results to improve search relevance.
Please see this link for further details: https://learn.microsoft.com/azure/search/semantic-search-overview
hybrid_retriever = index.as_retriever(
vector_store_query_mode=VectorStoreQueryMode.SEMANTIC_HYBRID
)
hybrid_retriever.retrieve("What is inception about?")
[NodeWithScore(node=TextNode(id_='b4d2af4e-1de0-4cfe-8b18-629722bc12d7', embedding=None, metadata={'director': 'Christopher Nolan'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text='Inception', mimetype='text/plain', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), score=2.379289150238037), NodeWithScore(node=TextNode(id_='07d77810-7697-4f16-a945-43b725795641', embedding=None, metadata={'file_path': 'c:\\Dev\\llama_index\\docs\\docs\\examples\\vector_stores\\..\\data\\paul_graham\\paul_graham_essay.txt', 'file_name': 'paul_graham_essay.txt', 'file_type': 'text/plain', 'file_size': 75395, 'creation_date': '2024-08-23', 'last_modified_date': '2024-08-23'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='e02f3256-42cf-4e96-8a2a-5ba4b5bae898', node_type=<ObjectType.DOCUMENT: '4'>, metadata={'file_path': 'c:\\Dev\\llama_index\\docs\\docs\\examples\\vector_stores\\..\\data\\paul_graham\\paul_graham_essay.txt', 'file_name': 'paul_graham_essay.txt', 'file_type': 'text/plain', 'file_size': 75395, 'creation_date': '2024-08-23', 'last_modified_date': '2024-08-23'}, hash='aec9c37ce12c52b02feb44993a53c84733fc31d3637224bda1c831b42009fc17'), <NodeRelationship.PREVIOUS: '2'>: RelatedNodeInfo(node_id='022dfb3b-baa4-477b-bd0e-2ab9d8ebbf93', node_type=<ObjectType.TEXT: '1'>, metadata={'file_path': 'c:\\Dev\\llama_index\\docs\\docs\\examples\\vector_stores\\..\\data\\paul_graham\\paul_graham_essay.txt', 'file_name': 'paul_graham_essay.txt', 'file_type': 'text/plain', 'file_size': 75395, 'creation_date': '2024-08-23', 'last_modified_date': '2024-08-23'}, hash='3cac601cf9b6b268149bedd26c5f6767d9492f245d57c8a11e591cd631ce0b00'), <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='b8570137-ba64-4838-84b4-baa2fd7322fa', node_type=<ObjectType.TEXT: '1'>, metadata={}, hash='7b37bcc69aec36a618efa128df1f1e8f6bd49b1ab4f57aea517945db504d0d83')}, text='[2]\r\n\r\nI\'m only up to age 25 and already there are such conspicuous patterns. Here I was, yet again about to attend some august institution in the hopes of learning about some prestigious subject, and yet again about to be disappointed. The students and faculty in the painting department at the Accademia were the nicest people you could imagine, but they had long since arrived at an arrangement whereby the students wouldn\'t require the faculty to teach anything, and in return the faculty wouldn\'t require the students to learn anything. And at the same time all involved would adhere outwardly to the conventions of a 19th century atelier. We actually had one of those little stoves, fed with kindling, that you see in 19th century studio paintings, and a nude model sitting as close to it as possible without getting burned. Except hardly anyone else painted her besides me. The rest of the students spent their time chatting or occasionally trying to imitate things they\'d seen in American art magazines.\r\n\r\nOur model turned out to live just down the street from me. She made a living from a combination of modelling and making fakes for a local antique dealer. She\'d copy an obscure old painting out of a book, and then he\'d take the copy and maltreat it to make it look old. [3]\r\n\r\nWhile I was a student at the Accademia I started painting still lives in my bedroom at night. These paintings were tiny, because the room was, and because I painted them on leftover scraps of canvas, which was all I could afford at the time. Painting still lives is different from painting people, because the subject, as its name suggests, can\'t move. People can\'t sit for more than about 15 minutes at a time, and when they do they don\'t sit very still. So the traditional m.o. for painting people is to know how to paint a generic person, which you then modify to match the specific person you\'re painting. Whereas a still life you can, if you want, copy pixel by pixel from what you\'re seeing. You don\'t want to stop there, of course, or you get merely photographic accuracy, and what makes a still life interesting is that it\'s been through a head. You want to emphasize the visual cues that tell you, for example, that the reason the color changes suddenly at a certain point is that it\'s the edge of an object. By subtly emphasizing such things you can make paintings that are more realistic than photographs not just in some metaphorical sense, but in the strict information-theoretic sense. [4]\r\n\r\nI liked painting still lives because I was curious about what I was seeing. In everyday life, we aren\'t consciously aware of much we\'re seeing. Most visual perception is handled by low-level processes that merely tell your brain "that\'s a water droplet" without telling you details like where the lightest and darkest points are, or "that\'s a bush" without telling you the shape and position of every leaf. This is a feature of brains, not a bug. In everyday life it would be distracting to notice every leaf on every bush. But when you have to paint something, you have to look more closely, and when you do there\'s a lot to see. You can still be noticing new things after days of trying to paint something people usually take for granted, just as you can after days of trying to write an essay about something people usually take for granted.\r\n\r\nThis is not the only way to paint. I\'m not 100% sure it\'s even a good way to paint. But it seemed a good enough bet to be worth trying.\r\n\r\nOur teacher, professor Ulivi, was a nice guy. He could see I worked hard, and gave me a good grade, which he wrote down in a sort of passport each student had. But the Accademia wasn\'t teaching me anything except Italian, and my money was running out, so at the end of the first year I went back to the US.\r\n\r\nI wanted to go back to RISD, but I was now broke and RISD was very expensive, so I decided to get a job for a year and then return to RISD the next fall. I got one at a company called Interleaf, which made software for creating documents. You mean like Microsoft Word? Exactly. That was how I learned that low end software tends to eat high end software. But Interleaf still had a few years to live yet. [5]\r\n\r\nInterleaf had done something pretty bold. Inspired by Emacs, they\'d added a scripting language, and even made the scripting language a dialect of Lisp.', mimetype='text/plain', start_char_idx=13709, end_char_idx=18066, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), score=0.9201899170875549)]