Context-Augmented OpenAI Agent¶
In this tutorial, we show you how to use our ContextRetrieverOpenAIAgent
implementation
to build an agent on top of OpenAI's function API and store/index an arbitrary number of tools. Our indexing/retrieval modules help to remove the complexity of having too many functions to fit in the prompt.
Initial Setup¶
Here we setup a ContextRetrieverOpenAIAgent. This agent will perform retrieval first before calling any tools. This can help ground the agent's tool picking and answering capabilities in context.
If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.
In [ ]:
Copied!
%pip install llama-index-agent-openai-legacy
%pip install llama-index-agent-openai-legacy
In [ ]:
Copied!
!pip install llama-index
!pip install llama-index
In [ ]:
Copied!
import json
from typing import Sequence
from llama_index.core import (
SimpleDirectoryReader,
VectorStoreIndex,
StorageContext,
load_index_from_storage,
)
from llama_index.core.tools import QueryEngineTool, ToolMetadata
import json
from typing import Sequence
from llama_index.core import (
SimpleDirectoryReader,
VectorStoreIndex,
StorageContext,
load_index_from_storage,
)
from llama_index.core.tools import QueryEngineTool, ToolMetadata
In [ ]:
Copied!
try:
storage_context = StorageContext.from_defaults(
persist_dir="./storage/march"
)
march_index = load_index_from_storage(storage_context)
storage_context = StorageContext.from_defaults(
persist_dir="./storage/june"
)
june_index = load_index_from_storage(storage_context)
storage_context = StorageContext.from_defaults(
persist_dir="./storage/sept"
)
sept_index = load_index_from_storage(storage_context)
index_loaded = True
except:
index_loaded = False
try:
storage_context = StorageContext.from_defaults(
persist_dir="./storage/march"
)
march_index = load_index_from_storage(storage_context)
storage_context = StorageContext.from_defaults(
persist_dir="./storage/june"
)
june_index = load_index_from_storage(storage_context)
storage_context = StorageContext.from_defaults(
persist_dir="./storage/sept"
)
sept_index = load_index_from_storage(storage_context)
index_loaded = True
except:
index_loaded = False
Download Data
In [ ]:
Copied!
!mkdir -p 'data/10q/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/10q/uber_10q_march_2022.pdf' -O 'data/10q/uber_10q_march_2022.pdf'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/10q/uber_10q_june_2022.pdf' -O 'data/10q/uber_10q_june_2022.pdf'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/10q/uber_10q_sept_2022.pdf' -O 'data/10q/uber_10q_sept_2022.pdf'
!mkdir -p 'data/10q/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/10q/uber_10q_march_2022.pdf' -O 'data/10q/uber_10q_march_2022.pdf'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/10q/uber_10q_june_2022.pdf' -O 'data/10q/uber_10q_june_2022.pdf'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/10q/uber_10q_sept_2022.pdf' -O 'data/10q/uber_10q_sept_2022.pdf'
In [ ]:
Copied!
# build indexes across the three data sources
if not index_loaded:
# load data
march_docs = SimpleDirectoryReader(
input_files=["./data/10q/uber_10q_march_2022.pdf"]
).load_data()
june_docs = SimpleDirectoryReader(
input_files=["./data/10q/uber_10q_june_2022.pdf"]
).load_data()
sept_docs = SimpleDirectoryReader(
input_files=["./data/10q/uber_10q_sept_2022.pdf"]
).load_data()
# build index
march_index = VectorStoreIndex.from_documents(march_docs)
june_index = VectorStoreIndex.from_documents(june_docs)
sept_index = VectorStoreIndex.from_documents(sept_docs)
# persist index
march_index.storage_context.persist(persist_dir="./storage/march")
june_index.storage_context.persist(persist_dir="./storage/june")
sept_index.storage_context.persist(persist_dir="./storage/sept")
# build indexes across the three data sources
if not index_loaded:
# load data
march_docs = SimpleDirectoryReader(
input_files=["./data/10q/uber_10q_march_2022.pdf"]
).load_data()
june_docs = SimpleDirectoryReader(
input_files=["./data/10q/uber_10q_june_2022.pdf"]
).load_data()
sept_docs = SimpleDirectoryReader(
input_files=["./data/10q/uber_10q_sept_2022.pdf"]
).load_data()
# build index
march_index = VectorStoreIndex.from_documents(march_docs)
june_index = VectorStoreIndex.from_documents(june_docs)
sept_index = VectorStoreIndex.from_documents(sept_docs)
# persist index
march_index.storage_context.persist(persist_dir="./storage/march")
june_index.storage_context.persist(persist_dir="./storage/june")
sept_index.storage_context.persist(persist_dir="./storage/sept")
In [ ]:
Copied!
march_engine = march_index.as_query_engine(similarity_top_k=3)
june_engine = june_index.as_query_engine(similarity_top_k=3)
sept_engine = sept_index.as_query_engine(similarity_top_k=3)
march_engine = march_index.as_query_engine(similarity_top_k=3)
june_engine = june_index.as_query_engine(similarity_top_k=3)
sept_engine = sept_index.as_query_engine(similarity_top_k=3)
In [ ]:
Copied!
query_engine_tools = [
QueryEngineTool(
query_engine=march_engine,
metadata=ToolMetadata(
name="uber_march_10q",
description=(
"Provides information about Uber 10Q filings for March 2022. "
"Use a detailed plain text question as input to the tool."
),
),
),
QueryEngineTool(
query_engine=june_engine,
metadata=ToolMetadata(
name="uber_june_10q",
description=(
"Provides information about Uber financials for June 2021. "
"Use a detailed plain text question as input to the tool."
),
),
),
QueryEngineTool(
query_engine=sept_engine,
metadata=ToolMetadata(
name="uber_sept_10q",
description=(
"Provides information about Uber financials for Sept 2021. "
"Use a detailed plain text question as input to the tool."
),
),
),
]
query_engine_tools = [
QueryEngineTool(
query_engine=march_engine,
metadata=ToolMetadata(
name="uber_march_10q",
description=(
"Provides information about Uber 10Q filings for March 2022. "
"Use a detailed plain text question as input to the tool."
),
),
),
QueryEngineTool(
query_engine=june_engine,
metadata=ToolMetadata(
name="uber_june_10q",
description=(
"Provides information about Uber financials for June 2021. "
"Use a detailed plain text question as input to the tool."
),
),
),
QueryEngineTool(
query_engine=sept_engine,
metadata=ToolMetadata(
name="uber_sept_10q",
description=(
"Provides information about Uber financials for Sept 2021. "
"Use a detailed plain text question as input to the tool."
),
),
),
]
Try Context-Augmented Agent¶
Here we augment our agent with context in different settings:
- toy context: we define some abbreviations that map to financial terms (e.g. R=Revenue). We supply this as context to the agent
In [ ]:
Copied!
from llama_index.core import Document
from llama_index.agent.openai_legacy import ContextRetrieverOpenAIAgent
from llama_index.core import Document
from llama_index.agent.openai_legacy import ContextRetrieverOpenAIAgent
In [ ]:
Copied!
# toy index - stores a list of abbreviations
texts = [
"Abbreviation: X = Revenue",
"Abbreviation: YZ = Risk Factors",
"Abbreviation: Z = Costs",
]
docs = [Document(text=t) for t in texts]
context_index = VectorStoreIndex.from_documents(docs)
# toy index - stores a list of abbreviations
texts = [
"Abbreviation: X = Revenue",
"Abbreviation: YZ = Risk Factors",
"Abbreviation: Z = Costs",
]
docs = [Document(text=t) for t in texts]
context_index = VectorStoreIndex.from_documents(docs)
In [ ]:
Copied!
context_agent = ContextRetrieverOpenAIAgent.from_tools_and_retriever(
query_engine_tools,
context_index.as_retriever(similarity_top_k=1),
verbose=True,
)
context_agent = ContextRetrieverOpenAIAgent.from_tools_and_retriever(
query_engine_tools,
context_index.as_retriever(similarity_top_k=1),
verbose=True,
)
In [ ]:
Copied!
response = context_agent.chat("What is the YZ of March 2022?")
response = context_agent.chat("What is the YZ of March 2022?")
Context information is below.
---------------------
Abbreviation: YZ = Risk Factors
---------------------
Given the context information and not prior knowledge, either pick the corresponding tool or answer the function: What is the YZ of March 2022?
=== Calling Function ===
Calling function: uber_march_10q with args: {
"input": "Risk Factors"
}
Got output:
•The COVID-19 pandemic and the impact of actions to mitigate the pandemic have adversely affected and may continue to adversely affect parts of our business.
•Our business would be adversely affected if Drivers were classified as employees, workers or quasi-employees instead of independent contractors.
•The mobility, delivery, and logistics industries are highly competitive, with well-established and low-cost alternatives that have been available for decades, low barriers to entry, low switching costs, and well-capitalized competitors in nearly every major geographic region.
•To remain competitive in certain markets, we have in the past lowered, and may continue to lower, fares or service fees, and we have in the past offered, and may continue to offer, significant Driver incentives and consumer discounts and promotions.
•We have incurred significant losses since inception, including in the United States and other major markets. We expect our operating expenses to increase significantly in the foreseeable future, and we may not achieve or maintain profitability.
•If we are unable to attract or maintain a critical mass of Drivers, consumers, merchants, shippers, and carriers, whether as a result of competition or other factors, our platform will become less appealing to platform users.
•Maintaining and enhancing our brand and reputation is critical to our business prospects. We have previously received significant media coverage and negative publicity regarding our brand and reputation, and while we have taken significant steps to rehabilitate our brand and reputation, failure to maintain and enhance our brand and reputation could adversely affect our business.
•The impact of economic conditions, including the resulting effect on discretionary consumer spending, may harm our business and operating results.
•Increases in fuel, food, labor, energy, and other costs due to inflation and other factors could adversely affect our operating results.
•If we experience security or privacy breaches or other unauthorized or improper access to, use of, disclosure of, alteration of or destruction of our proprietary or confidential data, employee data, or platform user data.
•Cyberattacks, including computer malware, ransomware, viruses, spamming, and phishing attacks could harm our reputation, business, and operating results.
•We are subject to climate change risks, including physical and transitional risks, and if we are unable to manage such risks, our business may be adversely impacted.
•We have made climate related commitments that require us to invest significant effort, resources, and management time and circumstances may arise, including those beyond our control, that may require us to revise the contemplated timeframes for implementing these commitments.
•We rely on third parties maintaining open marketplaces to distribute our platform and to provide the software we use in certain of our products and offerings. If such third parties interfere with the distribution of our products or offerings or with our use of such software, our business would be adversely affected.
•We will require additional capital to support the growth of our business, and this capital might not be available on reasonable terms or at all.
•If we are unable to successfully identify, acquire and integrate suitable businesses, our operating results and prospects could be harmed, and any businesses we acquire may not perform as expected or be effectively integrated.
•We may continue to be blocked from or limited in providing or operating our products and offerings in certain jurisdictions, and may be required to modify our business model in those jurisdictions as a result.
•Our business is subject to numerous legal and regulatory risks that could have an adverse impact on our business and future prospects.
•Our business is subject to extensive government regulation and oversight relating to the provision of payment and financial services.
•We face risks related to our collection, use, transfer, disclosure, and other processing of data, which could result in investigations, inquiries, litigation, fines, legislative and regulatory action, and negative press about our privacy and data protection practices.
•If we are unable to protect our intellectual property, or if third parties are successful in claiming that we are misappropriating the intellectual property of others, we may incur significant expense and our business may be adversely affected.
•The market price of our common stock has been, and may continue to be, volatile or may decline steeply or suddenly regardless of our operating performance, and we may not be able to meet investor or analyst expectations. You may not be able to resell your shares at or above the price you paid and may lose all or part of your investment.
========================
In [ ]:
Copied!
print(str(response))
print(str(response))
The risk factors for Uber in March 2022 include: 1. The adverse impact of the COVID-19 pandemic and actions taken to mitigate it on Uber's business. 2. The potential adverse effect on Uber's business if drivers are classified as employees instead of independent contractors. 3. Intense competition in the mobility, delivery, and logistics industries, with low-cost alternatives and well-capitalized competitors. 4. The need to lower fares, offer driver incentives, and provide consumer discounts and promotions to remain competitive in certain markets. 5. Uber's history of significant losses and the expectation of increased operating expenses in the future, which may affect profitability. 6. The importance of attracting and maintaining a critical mass of drivers, consumers, merchants, shippers, and carriers to keep the platform appealing. 7. The significance of maintaining and enhancing Uber's brand and reputation, as negative publicity could harm the business. 8. The potential impact of economic conditions and discretionary consumer spending on Uber's business. 9. The adverse effect of increasing costs, such as fuel, food, labor, energy, and inflation, on Uber's operating results. 10. The risk of security or privacy breaches and unauthorized access to Uber's proprietary or confidential data. 11. The potential harm to Uber's reputation, business, and operating results from cyberattacks. 12. The impact of climate change risks, including physical and transitional risks, on Uber's business. 13. The commitment to climate-related initiatives that require significant effort, resources, and management time. 14. The reliance on third parties for distributing Uber's platform and providing software, with the risk of interference or limitations. 15. The need for additional capital to support Uber's business growth, with uncertainty about its availability on reasonable terms. 16. The risks associated with identifying, acquiring, and integrating suitable businesses. 17. The potential limitations and modifications to Uber's business model in certain jurisdictions. 18. The legal and regulatory risks that could adversely impact Uber's business and future prospects. 19. The extensive government regulation and oversight related to payment and financial services provided by Uber. 20. The risks associated with data collection, use, transfer, disclosure, and processing, including investigations, litigation, and fines. 21. The importance of protecting Uber's intellectual property and the risk of claims of misappropriation. 22. The volatility and potential decline in the market price of Uber's common stock, which may not reflect operating performance. Please note that this is a summary of the risk factors mentioned in Uber's March 2022 10Q filing. For more detailed information, please refer to the official filing.
In [ ]:
Copied!
context_agent.chat("What is the X and Z in September 2022?")
context_agent.chat("What is the X and Z in September 2022?")
Use Uber 10-Q as context, use Calculator as Tool¶
In [ ]:
Copied!
from llama_index.core.tools import BaseTool, FunctionTool
def magic_formula(revenue: int, cost: int) -> int:
"""Runs MAGIC_FORMULA on revenue and cost."""
return revenue - cost
magic_tool = FunctionTool.from_defaults(fn=magic_formula, name="magic_formula")
from llama_index.core.tools import BaseTool, FunctionTool
def magic_formula(revenue: int, cost: int) -> int:
"""Runs MAGIC_FORMULA on revenue and cost."""
return revenue - cost
magic_tool = FunctionTool.from_defaults(fn=magic_formula, name="magic_formula")
In [ ]:
Copied!
context_agent = ContextRetrieverOpenAIAgent.from_tools_and_retriever(
[magic_tool], sept_index.as_retriever(similarity_top_k=3), verbose=True
)
context_agent = ContextRetrieverOpenAIAgent.from_tools_and_retriever(
[magic_tool], sept_index.as_retriever(similarity_top_k=3), verbose=True
)
In [ ]:
Copied!
response = context_agent.chat(
"Can you run MAGIC_FORMULA on Uber's revenue and cost?"
)
response = context_agent.chat(
"Can you run MAGIC_FORMULA on Uber's revenue and cost?"
)
Context information is below.
---------------------
Three Months Ended September 30, Nine Months Ended September 30,
2021 2022 2021 2022
Revenue 100 % 100 % 100 % 100 %
Costs and expenses
Cost of revenue, exclusive of depreciation and amortization shown separately
below 50 % 62 % 53 % 62 %
Operations and support 10 % 7 % 11 % 8 %
Sales and marketing 24 % 14 % 30 % 16 %
Research and development 10 % 9 % 13 % 9 %
General and administrative 13 % 11 % 15 % 10 %
Depreciation and amortization 4 % 3 % 6 % 3 %
Total costs and expenses 112 % 106 % 128 % 107 %
Loss from operations (12)% (6)% (28)% (7)%
Interest expense (3)% (2)% (3)% (2)%
Other income (expense), net (38)% (6)% 16 % (34)%
Loss before income taxes and income (loss) from equity method
investments (52)% (14)% (16)% (43)%
Provision for (benefit from) income taxes (2)% 1 % (3)% — %
Income (loss) from equity method investments — % — % — % — %
Net loss including non-controlling interests (50)% (14)% (12)% (42)%
Less: net income (loss) attributable to non-controlling interests,
net of tax — % — % (1)% — %
Net loss attributable to Uber Technologies, Inc. (50)% (14)% (12)% (42)%
Totals of percentage of revenues may not foot due to rounding.
The following discussion and analysis is for the three and nine months ended September 30, 2022 compared to same period in 2021.
Revenue
Three Months Ended September 30, Nine Months Ended September 30,
(In millions, except per centages) 2021 2022 % Change 2021 2022 % Change
Revenue $ 4,845 $ 8,343 72 %$ 11,677 $ 23,270 99 %
Three Months Ended September 30, 2022 Compared with the Same Period in 2021
Revenue increased $3.5 billion, or 72%, primarily attributable to an increase in Gross Bookings of 26%, or 32% on a constant currency basis. The increase in
Gross Bookings was primarily driven by increases in Mobility Trip volumes as the business recovers from the impacts of COVID-19 and a $1.3 billion increase in
Freight Gross Bookings resulting primarily from the acquisition of Transplace in the fourth quarter of 2021. Additionally, during the third quarter of 2022, we saw a
$1.1 billion increase in Mobility revenue as a result of business model changes in the UK. We also saw a $164 million increase in Delivery revenue resulting from
an increase in certain Courier payments and incentives that are recorded in cost of revenue, exclusive of depreciation and amortization, for certain markets where
we are primarily responsible for Delivery services and pay Couriers for services provided.
Nine Months Ended September 30, 2022 Compared with the Same Period in 2021
Revenue increased $11.6 billion, or 99%, primarily attributable to an increase in Gross Bookings of 31%, or 36% on a constant currency basis. The increase in
Gross Bookings was primarily driven by increases in Mobility Trip volumes as the business recovers from the impacts of COVID-19 and a $4.4 billion increase in
Freight Gross Bookings resulting primarily from the acquisition of Transplace in the fourth quarter of 2021. Additionally, during the first nine months of 2022, we
saw a $2.2 billion net increase in Mobility revenue as a result of business model changes in the UK and an accrual made for the resolution of historical claims in
the UK relating to the classification of drivers. We also saw a $751 million increase in Delivery revenue resulting from an increase in certain Courier payments and
incentives that are recorded in cost of revenue, exclusive of depreciation and amortization, for certain markets where we are primarily responsible for
UBER TECHNOLOGIES, INC.
CONDENSED CONSOLIDATED STATEMENTS OF OPERATIONS
(In millions, except share amounts which are reflected in thousands, and per share amounts)
(Unaudited)
Three Months Ended September 30, Nine Months Ended September 30,
2021 2022 2021 2022
Revenue $ 4,845 $ 8,343 $ 11,677 $ 23,270
Costs and expenses
Cost of revenue, exclusive of depreciation and amortization shown separately
below 2,438 5,173 6,247 14,352
Operations and support 475 617 1,330 1,808
Sales and marketing 1,168 1,153 3,527 3,634
Research and development 493 760 1,496 2,051
General and administrative 625 908 1,705 2,391
Depreciation and amortization 218 227 656 724
Total costs and expenses 5,417 8,838 14,961 24,960
Loss from operations (572) (495) (3,284) (1,690)
Interest expense (123) (146) (353) (414)
Other income (expense), net (1,832) (535) 1,821 (7,796)
Loss before income taxes and income (loss) from equity method investments (2,527) (1,176) (1,816) (9,900)
Provision for (benefit from) income taxes (101) 58 (395) (97)
Income (loss) from equity method investments (13) 30 (28) 65
Net loss including non-controlling interests (2,439) (1,204) (1,449) (9,738)
Less: net income (loss) attributable to non-controlling interests, net of
tax (15) 2 (61) (2)
Net loss attributable to Uber Technologies, Inc. $ (2,424)$ (1,206)$ (1,388)$ (9,736)
Net loss per share attributable to Uber Technologies, Inc. common
stockholders:
Basic $ (1.28)$ (0.61)$ (0.74)$ (4.96)
Diluted $ (1.28)$ (0.61)$ (0.75)$ (4.97)
Weighted-average shares used to compute net loss per share attributable to
common stockholders:
Basic 1,898,954 1,979,299 1,877,655 1,964,483
Diluted 1,898,954 1,979,299 1,878,997 1,968,228
The accompanying notes are an integral part of these condensed consolidated financial statements.
5
Components of Results of Operations
Revenue
We generate substantially all of our revenue from fees paid by Drivers and Merchants for use of our platform. We have concluded that we are an agent in these
arrangements as we arrange for other parties to provide the service to the end-user. Under this model, revenue is net of Driver and Merchant earnings and Driver
incentives. We act as an agent in these transactions by connecting consumers to Drivers and Merchants to facilitate a Trip, meal or grocery delivery service.
During the first quarter of 2022, we modified our arrangements in certain markets and, as a result, concluded we are responsible for the provision of mobility
services to end-users in those markets. We have determined that in these transactions, end-users are our customers and our sole performance obligation in the
transaction is to provide transportation services to the end-user. We recognize revenue when a trip is complete. In these markets where we are responsible for
mobility services, we present revenue from end-users on a gross basis, as we control the service provided by Drivers to end-users, while payments to Drivers in
exchange for mobility services are recognized in cost of revenue, exclusive of depreciation and amortization.
For additional discussion related to our revenue, see the section titled “Management’s Discussion and Analysis of Financial Condition and Results of
Operations - Critical Accounting Estimates - Revenue Recognition,” “Note 1 - Description of Business and Summary of Significant Accounting Policies - Revenue
Recognition,” and “Note 2 - Revenue” to our audited consolidated financial statements included in our Annual Report Form 10-K for the year ended December 31,
2021 and Note 2 – Revenue in this Quarterly Report on Form 10-Q.
Cost of Revenue, Exclusive of Depreciation and Amortization
Cost of revenue, exclusive of depreciation and amortization, primarily consists of certain insurance costs related to our Mobility and Delivery offerings, credit
card processing fees, bank fees, data center and networking expenses, mobile device and service costs, costs incurred with Carriers for Uber Freight transportation
services, amounts related to fare chargebacks and other credit card losses as well as costs incurred for certain Mobility and Delivery transactions where we are
primarily responsible for mobility or delivery services and pay Drivers and Couriers for services.
We expect that cost of revenue, exclusive of depreciation and amortization, will fluctuate on an absolute dollar basis for the foreseeable future in line with Trip
volume changes on the platform. As Trips increase or decrease, we expect related changes for insurance costs, credit card processing fees, hosting and co-located
data center expenses, maps license fees, and other cost of revenue, exclusive of depreciation and amortization.
Operations and Support
Operations and support expenses primarily consist of compensation expenses, including stock-based compensation, for employees that support operations in
cities, including the general managers, Driver operations, platform user support representatives and community managers. Also included is the cost of customer
support, Driver background checks and the allocation of certain corporate costs.
As our business recovers from the impacts of COVID-19 and Trip volume increases, we would expect operations and support expenses to increase on an
absolute dollar basis for the foreseeable future, but decrease as a percentage of revenue as we become more efficient in supporting platform users.
Sales and Marketing
Sales and marketing expenses primarily consist of compensation costs, including stock-based compensation to sales and marketing employees, advertising
costs, product marketing costs and discounts, loyalty programs, promotions, refunds, and credits provided to end-users who are not customers, and the allocation of
certain corporate costs. We expense advertising and other promotional expenditures as incurred.
As our business recovers from the impacts of COVID-19, we would anticipate sales and marketing expenses to increase on an absolute dollar basis for
---------------------
Given the context information and not prior knowledge, either pick the corresponding tool or answer the function: Can you run MAGIC_FORMULA on Uber's revenue and cost?
=== Calling Function ===
Calling function: magic_formula with args: {
"revenue": 23270,
"cost": 24960
}
Got output: -1690
========================
In [ ]:
Copied!
print(response)
print(response)
The result of running MAGIC_FORMULA on Uber's revenue and cost is -1690.