Retrieval-Augmented Image Captioning
In this example, we show how to leverage LLaVa + Replicate for image understanding/captioning and retrieve relevant unstructured text and embedded tables from Tesla 10K file according to the image understanding.
LlaVa can provide image understanding based on user prompt.
We use Unstructured to parse out the tables, and use LlamaIndex recursive retrieval to index/retrieve tables and texts.
We can leverage the image understanding from Step 1 to retrieve relevant information from knowledge base generated by Step 2 (which is indexed by LlamaIndex)
Context for LLaVA: Large Language and Vision Assistant
Website: https://llava-vl.github.io/
Paper: https://arxiv.org/abs/2304.08485
Github: https://github.com/haotian-liu/LLaVA
LLaVA is now supported in llama.cpp with 4-bit / 5-bit quantization support: https://github.com/ggerganov/llama.cpp/pull/3436 [Deprecated]
LLaVA 13b is now supported in Replicate: https://replicate.com/yorickvp/llava-13b
For LlamaIndex: LlaVa+Replicate enables us to run image understanding locally and combine the multi-modal knowledge with our RAG knowledge base system.
TODO:
Waiting for https://github.com/abetlen/llama-cpp-python supporting LlaVa model in python wrapper.
So LlamaIndex can leverage LlamaCPP
class for serving LlaVa model directly/locally.
Using Replicate serving LLaVa model through LlamaIndex
Build and Run LLaVa models locally through Llama.cpp (Deprecated)
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
. #Checkout llama.cpp repo for more details.make
Download Llava models including
ggml-model-*
andmmproj-model-*
from https://huggingface.co/mys/ggml_llava-v1.5-7b/tree/main # please select one model based on your own local configuration./llava
# for checking whether llava is running locally
%load_ext autoreload
% autoreload 2
! pip install unstructured
from unstructured.partition.html import partition_html
import pandas as pd
pd.set_option("display.max_rows", None)
pd.set_option("display.max_columns", None)
pd.set_option("display.width", None)
pd.set_option("display.max_colwidth", None)
Perform Data Extraction from Tesla 10K file
In these sections we use Unstructured to parse out the table and non-table elements.
Extract Elements
We use Unstructured to extract table and non-table elements from the 10-K filing.
!wget "https://www.dropbox.com/scl/fi/mlaymdy1ni1ovyeykhhuk/tesla_2021_10k.htm?rlkey=qf9k4zn0ejrbm716j0gg7r802&dl=1" -O tesla_2021_10k.htm
!wget "https://docs.google.com/uc?export=download&id=1THe1qqM61lretr9N3BmINc_NWDvuthYf" -O shanghai.jpg
!wget "https://docs.google.com/uc?export=download&id=1PDVCf_CzLWXNnNoRV8CFgoJxv6U0sHAO" -O tesla_supercharger.jpg
from llama_index.readers.file.flat_reader import FlatReader
from pathlib import Path
reader = FlatReader()
docs_2021 = reader.load_data(Path("tesla_2021_10k.htm"))
from llama_index.node_parser import (
UnstructuredElementNodeParser,
)
node_parser = UnstructuredElementNodeParser()
import openai
openai.api_key = "" # add your openai api key here
import os
import pickle
if not os.path.exists("2021_nodes.pkl"):
raw_nodes_2021 = node_parser.get_nodes_from_documents(docs_2021)
pickle.dump(raw_nodes_2021, open("2021_nodes.pkl", "wb"))
else:
raw_nodes_2021 = pickle.load(open("2021_nodes.pkl", "rb"))
base_nodes_2021, node_mappings_2021 = node_parser.get_base_nodes_and_mappings(
raw_nodes_2021
)
Setup Recursive Retriever
Now that we’ve extracted tables and their summaries, we can setup a recursive retriever in LlamaIndex to query these tables.
Construct Retrievers
from llama_index.retrievers import RecursiveRetriever
from llama_index.query_engine import RetrieverQueryEngine
from llama_index import VectorStoreIndex
# construct top-level vector index + query engine
vector_index = VectorStoreIndex(base_nodes_2021)
vector_retriever = vector_index.as_retriever(similarity_top_k=2)
vector_query_engine = vector_index.as_query_engine(similarity_top_k=2)
from llama_index.retrievers import RecursiveRetriever
recursive_retriever = RecursiveRetriever(
"vector",
retriever_dict={"vector": vector_retriever},
node_dict=node_mappings_2021,
verbose=True,
)
query_engine = RetrieverQueryEngine.from_args(recursive_retriever)
from PIL import Image
import matplotlib.pyplot as plt
imageUrl = "./tesla_supercharger.jpg"
image = Image.open(imageUrl).convert("RGB")
plt.figure(figsize=(16, 5))
plt.imshow(image)
<matplotlib.image.AxesImage at 0x2bee05b10>

Running LLaVa model using Replicate through LlamaIndex for image understanding
# [Deprecated] bash command for running llava locally
# import os
# llava_command = "./llava -m ./models/ggml-model-q5_k.gguf --mmproj ./models/mmproj-model-f16.gguf --image ./images/tesla-supercharger.png --temp 0.1 -p "
# llava_prompt = "'what is the main object in the image'"
### run bash for llava command
# llava_response = os.system(llava_command + " " + llava_prompt)
import os
from llama_index.llms import Replicate
os.environ["REPLICATE_API_TOKEN"] = "" # add your replicate api token here
multi_modal_llm = Replicate(
model="yorickvp/llava-13b:2facb4a474a0462c15041b78b1ad70952ea46b5ec6ad29583c0b29dbd4249591",
image=imageUrl,
)
llava_response = str(
multi_modal_llm.complete("what is the main object for tesla in the image?")
)
Retrieve relevant information from LlamaIndex knowledge base according to LLaVa image understanding
prompt_template = "please provide relevant information about: "
rag_response = query_engine.query(prompt_template + llava_response)
Retrieving with query id None: please provide relevant information about: Tesla super charger station
Retrieved node with id, entering: id_431_table
Retrieving with query id id_431_table: please provide relevant information about: Tesla super charger station
Retrieving text node: Additionally, our team has expertise in selecting and working with a range of materials for our vehicles to balance performance, cost and durability in ways that are best suited for our vehicles’ target demographics and utility. We have also used our capabilities to achieve complex engineering feats in stamping, casting and thermal systems, and are currently developing designs that integrate batteries directly with vehicle body structures without separate battery packs to optimize manufacturability, weight, range and cost characteristics.
We are also expanding our manufacturing operations globally while taking action to localize our vehicle designs and production for particular markets, including country-specific market demands and factory optimizations for local workforces. As we increase our capabilities, particularly in the areas of automation, die-making and line-building, we are also making strides in the simulations modeling these capabilities prior to construction.
Energy Generation and Storage
Our expertise in electrical, mechanical, civil and software engineering allows us to design and manufacture our energy generation and storage products and components. We also employ our design and engineering expertise to customize solutions including our energy storage products, solar energy systems and/or Solar Roof for customers to meet their specific needs. We have developed software that simplifies and expedites the design process, as well as mounting hardware that facilitates solar panel installation.
Sales and Marketing
Historically, we have been able to generate significant media coverage of our company and our products, and we believe we will continue to do so. Such media coverage and word of mouth are the current primary drivers of our sales leads and have helped us achieve sales without traditional advertising and at relatively low marketing costs.
Automotive
Direct Sales
Our vehicle sales channels currently include our website and an international network of company-owned stores. In some jurisdictions, we also have galleries to educate and inform customers about our products, but such locations do not actually transact in the sale of vehicles. We believe this infrastructure enables us to better control costs of inventory, manage warranty service and pricing, educate consumers about electric vehicles, maintain and strengthen the Tesla brand and obtain rapid customer feedback.
We reevaluate our sales strategy both globally and at a location-by-location level from time to time to optimize our current sales channels. Sales of vehicles in the automobile industry tend to be cyclical in many markets, which may expose us to volatility from time to time.
Used Vehicle Sales
Our used vehicle business supports new vehicle sales by integrating the trade-in of a customer’s existing Tesla or non-Tesla vehicle with the sale of a new or used Tesla vehicle. The Tesla and non-Tesla vehicles we acquire as trade-ins are subsequently remarketed, either directly by us or through third parties. We also remarket used Tesla vehicles acquired from other sources including lease returns.
Public Charging
We have a growing global network of Tesla Superchargers, which are our industrial grade, high-speed vehicle chargers. Where possible, we co-locate Superchargers with our solar and energy storage systems to reduce costs and promote renewable power. Supercharger stations are typically placed along well-traveled routes and in and around dense city centers to allow vehicle owners the ability to enjoy quick, reliable and ubiquitous charging with convenient, minimal stops. Use of the Supercharger network either requires payment of a fee or is free under certain sales programs.
We also work with a wide variety of hospitality, retail and public destinations, as well as businesses with commuting employees, to offer additional charging options for our customers. These Destination Charging and workplace locations deploy Tesla Wall Connectors to provide charging to Tesla vehicle owners who patronize or are employed at their businesses. We also work with single-family homeowners and multi-family residential entities to deploy home charging solutions.
In-App Upgrades
As our vehicles are capable of being updated remotely over-the-air, our customers may purchase additional paid options and features through the Tesla app or through the in-vehicle user interface. We expect that this functionality will also allow us to offer certain options and features on a subscription basis in the future.
Energy Generation and Storage
We market and sell our solar and energy storage products to residential, commercial and industrial customers and utilities through a variety of channels. We emphasize simplicity, standardization and accessibility to make it easy and cost-effective for customers to adopt clean energy, while reducing our customer acquisition costs.
In the U.S., we offer residential solar and energy storage products directly through our website, stores and galleries, as well as through our network of channel partners. Outside of the U.S., we use our international sales organization and a network of channel partners to market and sell these products for the residential market. We also sell Powerwall directly to utilities. In the case of products sold to utilities or channel partners, such partners typically sell the product to residential customers and manage the installation in customer homes.
Showing final RAG image caption results from LlamaIndex
print(str(rag_response))
Tesla Supercharger stations are part of Tesla's growing global network of high-speed vehicle chargers. These Supercharger stations are typically located along well-traveled routes and in dense city centers to provide Tesla vehicle owners with quick, reliable, and convenient charging options. The Supercharger network is designed to allow for minimal stops during long-distance travel. Tesla aims to co-locate Superchargers with their solar and energy storage systems whenever possible to promote renewable power and reduce costs. The use of the Supercharger network may require payment of a fee or be free under certain sales programs. Additionally, Tesla works with various hospitality, retail, and public destinations, as well as businesses with commuting employees, to offer additional charging options through their Destination Charging program. This program utilizes Tesla Wall Connectors to provide charging to Tesla vehicle owners who patronize or work at these establishments. Tesla also collaborates with single-family homeowners and multi-family residential entities to deploy home charging solutions.
from PIL import Image
import matplotlib.pyplot as plt
imageUrl = "./shanghai.jpg"
image = Image.open(imageUrl).convert("RGB")
plt.figure(figsize=(16, 5))
plt.imshow(image)
<matplotlib.image.AxesImage at 0x2c0be3d90>

Retrieve relevant information from LlamaIndex for a new image
# [Deprecated] bash command for llava to understand image
# llava_command = "./llava -m ./models/ggml-model-q5_k.gguf --mmproj ./models/mmproj-model-f16.gguf --image ./images/shanghai.png --temp 0.1 -p "
# llava_prompt = "'which tesla factory is in the image'"
# llava_response = os.system(llava_command + " " + llava_prompt)
multi_modal_llm = Replicate(
model="yorickvp/llava-13b:2facb4a474a0462c15041b78b1ad70952ea46b5ec6ad29583c0b29dbd4249591",
image=imageUrl,
)
llava_response = str(
multi_modal_llm.complete("which Tesla factory is shown in the image?")
)
prompt_template = "please provide relevant information about: "
rag_response = query_engine.query(prompt_template + llava_response)
Retrieving with query id None: please provide relevant information about: a large Tesla factory with a white roof, located in Shanghai, China. The factory is surrounded by a parking lot filled with numerous cars, including both small and large vehicles. The cars are parked in various positions, some closer to the factory and others further away. The scene gives an impression of a busy and well-organized facility, likely producing electric vehicles for the global market
Retrieved node with id, entering: id_431_table
Retrieving with query id id_431_table: please provide relevant information about: a large Tesla factory with a white roof, located in Shanghai, China. The factory is surrounded by a parking lot filled with numerous cars, including both small and large vehicles. The cars are parked in various positions, some closer to the factory and others further away. The scene gives an impression of a busy and well-organized facility, likely producing electric vehicles for the global market
Retrieving text node: We continue to increase the degree of localized procurement and manufacturing there. Gigafactory Shanghai is representative of our plan to iteratively improve our manufacturing operations as we establish new factories, as we implemented the learnings from our Model 3 and Model Y ramp at the Fremont Factory to commence and ramp our production at Gigafactory Shanghai quickly and cost-effectively.
Other Manufacturing
Generally, we continue to expand production capacity at our existing facilities. We also intend to further increase cost-competitiveness in our significant markets by strategically adding local manufacturing, including at Gigafactory Berlin in Germany and Gigafactory Texas in Austin, Texas, which will begin production in 2022.
Supply Chain
Our products use thousands of purchased parts that are sourced from hundreds of suppliers across the world. We have developed close relationships with vendors of key parts such as battery cells, electronics and complex vehicle assemblies. Certain components purchased from these suppliers are shared or are similar across many product lines, allowing us to take advantage of pricing efficiencies from economies of scale.
As is the case for most automotive companies, most of our procured components and systems are sourced from single suppliers. Where multiple sources are available for certain key components, we work to qualify multiple suppliers for them where it is sensible to do so in order to minimize production risks owing to disruptions in their supply. We also mitigate risk by maintaining safety stock for key parts and assemblies and die banks for components with lengthy procurement lead times.
Our products use various raw materials including aluminum, steel, cobalt, lithium, nickel and copper. Pricing for these materials is governed by market conditions and may fluctuate due to various factors outside of our control, such as supply and demand and market speculation. We strive to execute long-term supply contracts for such materials at competitive pricing when feasible, and we currently believe that we have adequate access to raw materials supplies in order to meet the needs of our operations.
Governmental Programs, Incentives and Regulations
Globally, both the operation of our business by us and the ownership of our products by our customers are impacted by various government programs, incentives and other arrangements. Our business and products are also subject to numerous governmental regulations that vary among jurisdictions.
Programs and Incentives
California Alternative Energy and Advanced Transportation Financing Authority Tax Incentives
We have agreements with the California Alternative Energy and Advanced Transportation Financing Authority that provide multi-year sales tax exclusions on purchases of manufacturing equipment that will be used for specific purposes, including the expansion and ongoing development of electric vehicles and powertrain production in California, thus reducing our cost basis in the related assets in our consolidated financial statements included elsewhere in this Annual Report on Form 10-K.
Gigafactory Nevada—Nevada Tax Incentives
In connection with the construction of Gigafactory Nevada, we entered into agreements with the State of Nevada and Storey County in Nevada that provide abatements for specified taxes, discounts to the base tariff energy rates and transferable tax credits in consideration of capital investment and hiring targets that were met at Gigafactory Nevada. These incentives are available until June 2024 or June 2034, depending on the incentive and primarily offset related costs in our consolidated financial statements included elsewhere in this Annual Report on Form 10-K.
Gigafactory New York—New York State Investment and Lease
We have a lease through the Research Foundation for the State University of New York (the “SUNY Foundation”) with respect to Gigafactory New York. Under the lease and a related research and development agreement, we are continuing to designate further buildouts at the facility. We are required to comply with certain covenants, including hiring and cumulative investment targets. This incentive offsets the related lease costs of the facility in our consolidated financial statements included elsewhere in this Annual Report on Form 10-K.
As we temporarily suspended most of our manufacturing operations at Gigafactory New York pursuant to a New York State executive order issued in March 2020 as a result of the COVID-19 pandemic, we were granted a deferral of our obligation to be compliant with our applicable targets through December 31, 2021 in an amendment memorialized in August 2021. As of December 31, 2021, we are in excess of such targets relating to investments and personnel in the State of New York and Buffalo.
Gigafactory Shanghai—Land Use Rights and Economic Benefits
We have an agreement with the local government of Shanghai for land use rights at Gigafactory Shanghai. Under the terms of the arrangement, we are required to meet a cumulative capital expenditure target and an annual tax revenue target starting at the end of 2023. In addition, the Shanghai government has granted to our Gigafactory Shanghai subsidiary certain incentives to be used in connection with eligible capital investments at Gigafactory Shanghai.
Showing final RAG image caption results from LlamaIndex
print(rag_response)
The Gigafactory Shanghai in Shanghai, China is a large Tesla factory that produces electric vehicles for the global market. The factory has a white roof and is surrounded by a parking lot filled with numerous cars, including both small and large vehicles. The cars are parked in various positions, some closer to the factory and others further away. This scene gives an impression of a busy and well-organized facility.