π¬π€ How to Build a Chatbot#
LlamaIndex serves as a bridge between your data and Language Learning Models (LLMs), providing a toolkit that enables you to establish a query interface around your data for a variety of tasks, such as question-answering and summarization.
In this tutorial, weβll walk you through building a context-augmented chatbot using a Data Agent. This agent, powered by LLMs, is capable of intelligently executing tasks over your data. The end result is a chatbot agent equipped with a robust set of data interface tools provided by LlamaIndex to answer queries about your data.
Note: This tutorial builds upon initial work on creating a query interface over SEC 10-K filings - check it out here.
Context#
In this guide, weβll build a β10-K Chatbotβ that uses raw UBER 10-K HTML filings from Dropbox. Users can interact with the chatbot to ask questions related to the 10-K filings.
Preparation#
import os
import openai
os.environ["OPENAI_API_KEY"] = "sk-..."
openai.api_key = os.environ["OPENAI_API_KEY"]
import nest_asyncio
nest_asyncio.apply()
# set text wrapping
from IPython.display import HTML, display
def set_css():
display(
HTML(
"""
<style>
pre {
white-space: pre-wrap;
}
</style>
"""
)
)
get_ipython().events.register("pre_run_cell", set_css)
Ingest Data#
Letβs first download the raw 10-k files, from 2019-2022.
# NOTE: the code examples assume you're operating within a Jupyter notebook.
# download files
!mkdir data
!wget "https://www.dropbox.com/s/948jr9cfs7fgj99/UBER.zip?dl=1" -O data/UBER.zip
!unzip data/UBER.zip -d data
--2023-09-22 11:13:42-- https://www.dropbox.com/s/948jr9cfs7fgj99/UBER.zip?dl=1
Resolving www.dropbox.com (www.dropbox.com)... 2620:100:601f:18::a27d:912, 162.125.5.18
Connecting to www.dropbox.com (www.dropbox.com)|2620:100:601f:18::a27d:912|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: /s/dl/948jr9cfs7fgj99/UBER.zip [following]
--2023-09-22 11:13:43-- https://www.dropbox.com/s/dl/948jr9cfs7fgj99/UBER.zip
Reusing existing connection to [www.dropbox.com]:443.
HTTP request sent, awaiting response... 302 Found
Location: https://uc5e96fc71f5bcad342d7ef5261b.dl.dropboxusercontent.com/cd/0/get/CEMPMHdxNS2yZDvMeO8IVhjAHBo1ExUFCUxxR3rUUAuuAn2VBlNyyyzCCERRU4Uj9cVyRgHADCluk4Kqqe1NWdxiC1Uh1u85EJEPIlVuW1gK9-KC3EcD0tD7u21w14I6d80gfspvvfKJCFzc15556zTV/file?dl=1# [following]
--2023-09-22 11:13:43-- https://uc5e96fc71f5bcad342d7ef5261b.dl.dropboxusercontent.com/cd/0/get/CEMPMHdxNS2yZDvMeO8IVhjAHBo1ExUFCUxxR3rUUAuuAn2VBlNyyyzCCERRU4Uj9cVyRgHADCluk4Kqqe1NWdxiC1Uh1u85EJEPIlVuW1gK9-KC3EcD0tD7u21w14I6d80gfspvvfKJCFzc15556zTV/file?dl=1
Resolving uc5e96fc71f5bcad342d7ef5261b.dl.dropboxusercontent.com (uc5e96fc71f5bcad342d7ef5261b.dl.dropboxusercontent.com)... 2620:100:601f:15::a27d:90f, 162.125.5.15
Connecting to uc5e96fc71f5bcad342d7ef5261b.dl.dropboxusercontent.com (uc5e96fc71f5bcad342d7ef5261b.dl.dropboxusercontent.com)|2620:100:601f:15::a27d:90f|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1820227 (1,7M) [application/binary]
Saving to: βdata/UBER.zipβ
data/UBER.zip 100%[===================>] 1,74M 3,12MB/s in 0,6s
2023-09-22 11:13:45 (3,12 MB/s) - βdata/UBER.zipβ saved [1820227/1820227]
Archive: data/UBER.zip
creating: data/UBER/
inflating: data/UBER/UBER_2021.html
inflating: data/__MACOSX/UBER/._UBER_2021.html
inflating: data/UBER/UBER_2020.html
inflating: data/__MACOSX/UBER/._UBER_2020.html
inflating: data/UBER/UBER_2019.html
inflating: data/__MACOSX/UBER/._UBER_2019.html
inflating: data/UBER/UBER_2022.html
inflating: data/__MACOSX/UBER/._UBER_2022.html
To parse the HTML files into formatted text, we use the Unstructured library. Thanks to LlamaHub, we can directly integrate with Unstructured, allowing conversion of any text into a Document format that LlamaIndex can ingest.
First we install the necessary packages:
!pip install llama-hub unstructured
Collecting llama-hub
Obtaining dependency information for llama-hub from https://files.pythonhosted.org/packages/3f/af/3bc30c2b7ca1bdd7a193f67443539f6667a6b77dd62e54f2c5c8464ad4cb/llama_hub-0.0.31-py3-none-any.whl.metadata
Downloading llama_hub-0.0.31-py3-none-any.whl.metadata (8.8 kB)
Requirement already satisfied: unstructured in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (0.10.15)
Collecting atlassian-python-api (from llama-hub)
Obtaining dependency information for atlassian-python-api from https://files.pythonhosted.org/packages/ca/ed/3577ccec639736c8e4660423be68cf1a4a7040bf543b3144793760792949/atlassian_python_api-3.41.2-py3-none-any.whl.metadata
Downloading atlassian_python_api-3.41.2-py3-none-any.whl.metadata (8.7 kB)
Collecting html2text (from llama-hub)
Downloading html2text-2020.1.16-py3-none-any.whl (32 kB)
Requirement already satisfied: llama-index>=0.6.9 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from llama-hub) (0.8.29.post1)
Requirement already satisfied: psutil in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from llama-hub) (5.9.5)
Collecting retrying (from llama-hub)
Downloading retrying-1.3.4-py3-none-any.whl (11 kB)
Requirement already satisfied: chardet in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from unstructured) (5.2.0)
Requirement already satisfied: filetype in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from unstructured) (1.2.0)
Requirement already satisfied: python-magic in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from unstructured) (0.4.27)
Requirement already satisfied: lxml in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from unstructured) (4.9.3)
Requirement already satisfied: nltk in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from unstructured) (3.8.1)
Requirement already satisfied: tabulate in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from unstructured) (0.9.0)
Requirement already satisfied: requests in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from unstructured) (2.31.0)
Requirement already satisfied: beautifulsoup4 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from unstructured) (4.12.2)
Requirement already satisfied: emoji in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from unstructured) (2.8.0)
Requirement already satisfied: dataclasses-json in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from unstructured) (0.5.14)
Requirement already satisfied: tiktoken in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from llama-index>=0.6.9->llama-hub) (0.5.1)
Requirement already satisfied: langchain>=0.0.293 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from llama-index>=0.6.9->llama-hub) (0.0.295)
Requirement already satisfied: sqlalchemy>=2.0.15 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from llama-index>=0.6.9->llama-hub) (2.0.21)
Requirement already satisfied: numpy in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from llama-index>=0.6.9->llama-hub) (1.26.0)
Requirement already satisfied: tenacity<9.0.0,>=8.2.0 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from llama-index>=0.6.9->llama-hub) (8.2.3)
Requirement already satisfied: openai>=0.26.4 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from llama-index>=0.6.9->llama-hub) (0.28.0)
Requirement already satisfied: pandas in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from llama-index>=0.6.9->llama-hub) (2.1.0)
Requirement already satisfied: urllib3<2 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from llama-index>=0.6.9->llama-hub) (1.26.16)
Requirement already satisfied: fsspec>=2023.5.0 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from llama-index>=0.6.9->llama-hub) (2023.9.1)
Requirement already satisfied: typing-inspect>=0.8.0 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from llama-index>=0.6.9->llama-hub) (0.9.0)
Requirement already satisfied: typing-extensions>=4.5.0 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from llama-index>=0.6.9->llama-hub) (4.8.0)
Requirement already satisfied: nest-asyncio in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from llama-index>=0.6.9->llama-hub) (1.5.8)
Collecting deprecated (from atlassian-python-api->llama-hub)
Obtaining dependency information for deprecated from https://files.pythonhosted.org/packages/20/8d/778b7d51b981a96554f29136cd59ca7880bf58094338085bcf2a979a0e6a/Deprecated-1.2.14-py2.py3-none-any.whl.metadata
Downloading Deprecated-1.2.14-py2.py3-none-any.whl.metadata (5.4 kB)
Requirement already satisfied: six in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from atlassian-python-api->llama-hub) (1.16.0)
Requirement already satisfied: oauthlib in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from atlassian-python-api->llama-hub) (3.2.2)
Requirement already satisfied: requests-oauthlib in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from atlassian-python-api->llama-hub) (1.3.1)
Requirement already satisfied: soupsieve>1.2 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from beautifulsoup4->unstructured) (2.5)
Requirement already satisfied: marshmallow<4.0.0,>=3.18.0 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from dataclasses-json->unstructured) (3.20.1)
Requirement already satisfied: click in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from nltk->unstructured) (8.1.7)
Requirement already satisfied: joblib in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from nltk->unstructured) (1.3.2)
Requirement already satisfied: regex>=2021.8.3 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from nltk->unstructured) (2023.8.8)
Requirement already satisfied: tqdm in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from nltk->unstructured) (4.66.1)
Requirement already satisfied: charset-normalizer<4,>=2 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from requests->unstructured) (3.2.0)
Requirement already satisfied: idna<4,>=2.5 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from requests->unstructured) (3.4)
Requirement already satisfied: certifi>=2017.4.17 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from requests->unstructured) (2023.7.22)
Requirement already satisfied: PyYAML>=5.3 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from langchain>=0.0.293->llama-index>=0.6.9->llama-hub) (6.0.1)
Requirement already satisfied: aiohttp<4.0.0,>=3.8.3 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from langchain>=0.0.293->llama-index>=0.6.9->llama-hub) (3.8.5)
Requirement already satisfied: async-timeout<5.0.0,>=4.0.0 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from langchain>=0.0.293->llama-index>=0.6.9->llama-hub) (4.0.3)
Requirement already satisfied: langsmith<0.1.0,>=0.0.38 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from langchain>=0.0.293->llama-index>=0.6.9->llama-hub) (0.0.38)
Requirement already satisfied: numexpr<3.0.0,>=2.8.4 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from langchain>=0.0.293->llama-index>=0.6.9->llama-hub) (2.8.6)
Requirement already satisfied: pydantic<3,>=1 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from langchain>=0.0.293->llama-index>=0.6.9->llama-hub) (1.10.12)
Requirement already satisfied: packaging>=17.0 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from marshmallow<4.0.0,>=3.18.0->dataclasses-json->unstructured) (23.1)
Requirement already satisfied: greenlet!=0.4.17 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from sqlalchemy>=2.0.15->llama-index>=0.6.9->llama-hub) (2.0.2)
Requirement already satisfied: mypy-extensions>=0.3.0 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from typing-inspect>=0.8.0->llama-index>=0.6.9->llama-hub) (1.0.0)
Requirement already satisfied: wrapt<2,>=1.10 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from deprecated->atlassian-python-api->llama-hub) (1.15.0)
Requirement already satisfied: python-dateutil>=2.8.2 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from pandas->llama-index>=0.6.9->llama-hub) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from pandas->llama-index>=0.6.9->llama-hub) (2023.3.post1)
Requirement already satisfied: tzdata>=2022.1 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from pandas->llama-index>=0.6.9->llama-hub) (2023.3)
Requirement already satisfied: attrs>=17.3.0 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain>=0.0.293->llama-index>=0.6.9->llama-hub) (23.1.0)
Requirement already satisfied: multidict<7.0,>=4.5 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain>=0.0.293->llama-index>=0.6.9->llama-hub) (6.0.4)
Requirement already satisfied: yarl<2.0,>=1.0 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain>=0.0.293->llama-index>=0.6.9->llama-hub) (1.9.2)
Requirement already satisfied: frozenlist>=1.1.1 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain>=0.0.293->llama-index>=0.6.9->llama-hub) (1.4.0)
Requirement already satisfied: aiosignal>=1.1.2 in /home/jtorres/llama_index/.venv/lib/python3.10/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain>=0.0.293->llama-index>=0.6.9->llama-hub) (1.3.1)
Downloading llama_hub-0.0.31-py3-none-any.whl (9.8 MB)
ββββββββββββββββββββββββββββββββββββββββ 9.8/9.8 MB 16.4 MB/s eta 0:00:0000:0100:01
?25hDownloading atlassian_python_api-3.41.2-py3-none-any.whl (167 kB)
ββββββββββββββββββββββββββββββββββββββββ 167.2/167.2 kB 20.8 MB/s eta 0:00:00
?25hDownloading Deprecated-1.2.14-py2.py3-none-any.whl (9.6 kB)
Installing collected packages: retrying, html2text, deprecated, atlassian-python-api, llama-hub
Successfully installed atlassian-python-api-3.41.2 deprecated-1.2.14 html2text-2020.1.16 llama-hub-0.0.31 retrying-1.3.4
Then we can use the UnstructuredReader
to parse the HTML files into a list of Document
objects.
from llama_hub.file.unstructured.base import UnstructuredReader
from pathlib import Path
years = [2022, 2021, 2020, 2019]
loader = UnstructuredReader()
doc_set = {}
all_docs = []
for year in years:
year_docs = loader.load_data(
file=Path(f"./data/UBER/UBER_{year}.html"), split_documents=False
)
# insert year metadata into each year
for d in year_docs:
d.metadata = {"year": year}
doc_set[year] = year_docs
all_docs.extend(year_docs)
[nltk_data] Downloading package punkt to /home/jtorres/nltk_data...
[nltk_data] Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data] /home/jtorres/nltk_data...
[nltk_data] Package averaged_perceptron_tagger is already up-to-
[nltk_data] date!
Setting up Vector Indices for each year#
We first setup a vector index for each year. Each vector index allows us to ask questions about the 10-K filing of a given year.
We build each index and save it to disk.
# initialize simple vector indices
# NOTE: don't run this cell if the indices are already loaded!
from llama_index import VectorStoreIndex, ServiceContext, StorageContext
index_set = {}
service_context = ServiceContext.from_defaults(chunk_size=512)
for year in years:
storage_context = StorageContext.from_defaults()
cur_index = VectorStoreIndex.from_documents(
doc_set[year],
service_context=service_context,
storage_context=storage_context,
)
index_set[year] = cur_index
storage_context.persist(persist_dir=f"./storage/{year}")
To load an index from disk, do the following
# Load indices from disk
from llama_index import load_index_from_storage
index_set = {}
for year in years:
storage_context = StorageContext.from_defaults(
persist_dir=f"./storage/{year}"
)
cur_index = load_index_from_storage(
storage_context, service_context=service_context
)
index_set[year] = cur_index
Setting up a Sub Question Query Engine to Synthesize Answers Across 10-K Filings#
Since we have access to documents of 4 years, we may not only want to ask questions regarding the 10-K document of a given year, but ask questions that require analysis over all 10-K filings.
To address this, we can use a Sub Question Query Engine. It decomposes a query into subqueries, each answered by an individual vector index, and synthesizes the results to answer the overall query.
LlamaIndex provides some wrappers around indices (and query engines) so that they can be used by query engines and agents. First we define a QueryEngineTool
for each vector index.
Each tool has a name and a description; these are what the LLM agent sees to decide which tool to choose.
from llama_index.tools import QueryEngineTool, ToolMetadata
individual_query_engine_tools = [
QueryEngineTool(
query_engine=index_set[year].as_query_engine(),
metadata=ToolMetadata(
name=f"vector_index_{year}",
description=(
"useful for when you want to answer queries about the"
f" {year} SEC 10-K for Uber"
),
),
)
for year in years
]
Now we can create the Sub Question Query Engine, which will allow us to synthesize answers across the 10-K filings. We pass in the individual_query_engine_tools
we defined above, as well as a service_context
that will be used to run the subqueries.
from llama_index.query_engine import SubQuestionQueryEngine
query_engine = SubQuestionQueryEngine.from_defaults(
query_engine_tools=individual_query_engine_tools,
service_context=service_context,
)
Setting up the Chatbot Agent#
We use a LlamaIndex Data Agent to setup the outer chatbot agent, which has access to a set of Tools. Specifically, we will use an OpenAIAgent, that takes advantage of OpenAI API function calling. We want to use the separate Tools we defined previously for each index (corresponding to a given year), as well as a tool for the sub question query engine we defined above.
First we define a QueryEngineTool
for the sub question query engine:
query_engine_tool = QueryEngineTool(
query_engine=query_engine,
metadata=ToolMetadata(
name="sub_question_query_engine",
description=(
"useful for when you want to answer queries that require analyzing"
" multiple SEC 10-K documents for Uber"
),
),
)
Then, we combine the Tools we defined above into a single list of tools for the agent:
tools = individual_query_engine_tools + [query_engine_tool]
Finally, we call OpenAIAgent.from_tools
to create the agent, passing in the list of tools we defined above.
from llama_index.agent import OpenAIAgent
agent = OpenAIAgent.from_tools(tools, verbose=True)
Testing the Agent#
We can now test the agent with various queries.
If we test it with a simple βhelloβ query, the agent does not use any Tools.
response = agent.chat("hi, i am bob")
print(str(response))
Hello Bob! How can I assist you today?
If we test it with a query regarding the 10-k of a given year, the agent will use the relevant vector index Tool.
response = agent.chat(
"What were some of the biggest risk factors in 2020 for Uber?"
)
print(str(response))
=== Calling Function ===
Calling function: vector_index_2020 with args: {
"input": "biggest risk factors"
}
Got output: The biggest risk factors mentioned in the context are:
1. The adverse impact of the COVID-19 pandemic and actions taken to mitigate it on the business.
2. The potential reclassification of drivers as employees, workers, or quasi-employees instead of independent contractors.
3. Intense competition in the mobility, delivery, and logistics industries, with low barriers to entry and well-capitalized competitors.
4. The need to lower fares or service fees and offer driver incentives and consumer discounts to remain competitive.
5. Significant losses incurred and the uncertainty of achieving profitability.
6. The risk of not attracting or maintaining a critical mass of platform users.
7. Operational, compliance, and cultural challenges related to the workplace culture and forward-leaning approach.
8. The potential negative impact of international investments and the challenges of conducting business in foreign countries, including operational and compliance challenges, localization requirements, restrictive laws and regulations, competition from local companies, social acceptance, technological compatibility, improper business practices, legal uncertainty, difficulties in managing international operations, currency exchange rate fluctuations, and regulations governing local currencies.
========================
In 2020, some of the biggest risk factors for Uber were:
1. The adverse impact of the COVID-19 pandemic and the measures taken to mitigate it on the business.
2. The potential reclassification of drivers as employees, workers, or quasi-employees instead of independent contractors.
3. Intense competition in the mobility, delivery, and logistics industries, with low barriers to entry and well-capitalized competitors.
4. The need to lower fares or service fees and offer driver incentives and consumer discounts to remain competitive.
5. Significant losses incurred and uncertainty about achieving profitability.
6. The risk of not attracting or maintaining a critical mass of platform users.
7. Operational, compliance, and cultural challenges related to the workplace culture and forward-leaning approach.
8. The potential negative impact of international investments and the challenges of conducting business in foreign countries, including operational and compliance challenges, localization requirements, restrictive laws and regulations, competition from local companies, social acceptance, technological compatibility, improper business practices, legal uncertainty, difficulties in managing international operations, currency exchange rate fluctuations, and regulations governing local currencies.
These risk factors highlight the challenges and uncertainties faced by Uber in 2020.
Finally, if we test it with a query to compare/contrast risk factors across years, the agent will use the Sub Question Query Engine Tool.
cross_query_str = (
"Compare/contrast the risk factors described in the Uber 10-K across"
" years. Give answer in bullet points."
)
response = agent.chat(cross_query_str)
print(str(response))
=== Calling Function ===
Calling function: sub_question_query_engine with args: {
"input": "Compare/contrast the risk factors described in the Uber 10-K across years"
}
Generated 4 sub questions.
[vector_index_2022] Q: What are the risk factors described in the 2022 SEC 10-K for Uber?
[vector_index_2021] Q: What are the risk factors described in the 2021 SEC 10-K for Uber?
[vector_index_2020] Q: What are the risk factors described in the 2020 SEC 10-K for Uber?
[vector_index_2019] Q: What are the risk factors described in the 2019 SEC 10-K for Uber?
[vector_index_2022] A: The risk factors described in the 2022 SEC 10-K for Uber include the potential adverse effect on their business if drivers were classified as employees instead of independent contractors, the highly competitive nature of the mobility, delivery, and logistics industries, the need to lower fares or service fees to remain competitive in certain markets, the company's history of significant losses and the expectation of increased operating expenses in the future, and the potential impact on their platform if they are unable to attract or maintain a critical mass of drivers, consumers, merchants, shippers, and carriers.
[vector_index_2019] A: The risk factors described in the 2019 SEC 10-K for Uber include the loss of their license to operate in London, the complexity of their business and operating model due to regulatory uncertainties, the potential for additional regulations for their other products in the Other Bets segment, the evolving laws and regulations regarding the development and deployment of autonomous vehicles, and the increasing number of data protection and privacy laws around the world. Additionally, there are legal proceedings, litigation, claims, and government investigations that Uber is involved in, including those related to the classification of drivers and compliance with applicable laws, which could impose a significant burden on the company.
[vector_index_2021] A: The risk factors described in the 2021 SEC 10-K for Uber include the adverse impact of the COVID-19 pandemic and actions taken to mitigate it on their business, the potential reclassification of drivers as employees instead of independent contractors, intense competition in the mobility, delivery, and logistics industries, the need to lower fares and offer incentives to remain competitive, significant losses incurred and the expectation of increased operating expenses, the importance of attracting and maintaining a critical mass of drivers, consumers, merchants, shippers, and carriers, and the uncertainty surrounding the impact of COVID-19 on their business and financial position. Additionally, the classification of drivers is being challenged in courts and by government agencies, which could have legal and financial implications for the company.
[vector_index_2020] A: The risk factors described in the 2020 SEC 10-K for Uber include the adverse impact of the COVID-19 pandemic on their business, the potential reclassification of drivers as employees instead of independent contractors, intense competition in the mobility, delivery, and logistics industries, the need to lower fares and offer incentives to remain competitive, significant losses and the uncertainty of achieving profitability, the importance of attracting and maintaining a critical mass of platform users, operational and compliance challenges, inquiries and investigations from government agencies, risks related to data security breaches, the need to introduce new or upgraded products and features, and the need to invest in the development of new offerings to retain and attract users.
Got output: The risk factors described in the Uber 10-K reports across the years include the potential reclassification of drivers as employees instead of independent contractors, intense competition in the mobility, delivery, and logistics industries, the need to lower fares and offer incentives to remain competitive, significant losses incurred and the expectation of increased operating expenses, the importance of attracting and maintaining a critical mass of drivers, consumers, merchants, shippers, and carriers, and the impact of the COVID-19 pandemic on their business. Additionally, there are legal and regulatory uncertainties, such as the evolving laws and regulations regarding autonomous vehicles, data protection and privacy laws, and the potential for additional regulations for their other products. The reports also mention the operational and compliance challenges, inquiries and investigations from government agencies, and the risks associated with data security breaches. It is worth noting that specific risk factors may vary from year to year based on the prevailing circumstances and developments in the industry and regulatory environment.
========================
Here are the key points comparing and contrasting the risk factors described in the Uber 10-K reports across years:
2022:
- Potential reclassification of drivers as employees instead of independent contractors.
- Intense competition in the mobility, delivery, and logistics industries.
- Need to lower fares and offer incentives to remain competitive.
- Significant losses incurred and expectation of increased operating expenses.
- Importance of attracting and maintaining a critical mass of drivers, consumers, merchants, shippers, and carriers.
- Impact of the COVID-19 pandemic on their business.
- Legal and regulatory uncertainties, including evolving laws and regulations regarding autonomous vehicles and data protection and privacy laws.
- Operational and compliance challenges.
- Inquiries and investigations from government agencies.
- Risks associated with data security breaches.
2021:
- Similar risk factors as in 2022, including potential reclassification of drivers, intense competition, need to lower fares, significant losses, and the impact of the COVID-19 pandemic.
- Emphasis on the importance of maintaining a critical mass of drivers, consumers, merchants, shippers, and carriers.
- Mention of legal and regulatory uncertainties, such as evolving laws and regulations regarding autonomous vehicles and data protection and privacy laws.
- Operational and compliance challenges.
- Inquiries and investigations from government agencies.
- Risks associated with data security breaches.
2020:
- Similar risk factors as in 2021, including potential reclassification of drivers, intense competition, need to lower fares, significant losses, and the impact of the COVID-19 pandemic.
- Emphasis on the importance of maintaining a critical mass of drivers, consumers, merchants, shippers, and carriers.
- Mention of legal and regulatory uncertainties, such as evolving laws and regulations regarding autonomous vehicles and data protection and privacy laws.
- Operational and compliance challenges.
- Inquiries and investigations from government agencies.
- Risks associated with data security breaches.
2019:
- Similar risk factors as in 2020, including potential reclassification of drivers, intense competition, need to lower fares, significant losses, and the impact of the COVID-19 pandemic.
- Emphasis on the importance of maintaining a critical mass of drivers, consumers, merchants, shippers, and carriers.
- Mention of legal and regulatory uncertainties, such as evolving laws and regulations regarding autonomous vehicles and data protection and privacy laws.
- Operational and compliance challenges.
- Inquiries and investigations from government agencies.
- Risks associated with data security breaches.
Please note that these are just the key points, and there may be additional risk factors mentioned in each year's 10-K report.
Setting up the Chatbot Loop#
Now that we have the chatbot setup, it only takes a few more steps to setup a basic interactive loop to chat with our SEC-augmented chatbot!
agent = OpenAIAgent.from_tools(tools) # verbose=False by default
while True:
text_input = input("User: ")
if text_input == "exit":
break
response = agent.chat(text_input)
print(f"Agent: {response}")
# User: What were some of the legal proceedings against Uber in 2022?
Agent: In 2022, Uber is facing several legal proceedings. Here are some of them:
1. California: The state Attorney General and city attorneys filed a complaint against Uber and Lyft, alleging that drivers are misclassified as independent contractors. A preliminary injunction was issued but stayed pending appeal. The Court of Appeal affirmed the lower court's ruling, and Uber filed a petition for review with the California Supreme Court. However, the Supreme Court declined the petition for review. The lawsuit is ongoing, focusing on claims by the California Attorney General for periods prior to the enactment of Proposition 22.
2. Massachusetts: The Attorney General of Massachusetts filed a complaint against Uber, alleging that drivers are employees entitled to wage and labor law protections. Uber's motion to dismiss the complaint was denied, and a summary judgment motion is pending.
3. New York: Uber is facing allegations of misclassification and employment violations by the state Attorney General. The resolution of this matter is uncertain.
4. Switzerland: Several administrative bodies in Switzerland have issued rulings classifying Uber drivers as employees for social security or labor purposes. Uber is challenging these rulings before the Social Security and Administrative Tribunals.
These are some of the legal proceedings against Uber in 2022. The outcomes and potential losses in these cases are uncertain.