Multimodal Ollama Cookbook#

This cookbook shows how you can build different multimodal RAG use cases with LLaVa on Ollama.

Structured Data Extraction from Images
Retrieval-Augmented Image Captioning
Multi-modal RAG

Setup Model#

!pip install llama-index-multi-modal-llms-ollama
!pip install llama-index-readers-file
!pip install unstructured
!pip install llama-index-embeddings-huggingface
!pip install llama-index-vector-stores-qdrant
!pip install llama-index-embeddings-clip

from llama_index.multi_modal_llms.ollama import OllamaMultiModal

mm_model = OllamaMultiModal(model="llava:13b")

Structured Data Extraction from Images#

Here we show how to use LLaVa to extract information from an image into a structured Pydantic object.

We can do this via our MultiModalLLMCompletionProgram. It is instantiated with a prompt template, set of images you’d want to ask questions over, and the desired output Pydantic object.

Load Data#

Let’s first load an image ad for fried chicken.

from pathlib import Path
from llama_index.core import SimpleDirectoryReader
from PIL import Image
import matplotlib.pyplot as plt

input_image_path = Path("restaurant_images")
if not input_image_path.exists():
    Path.mkdir(input_image_path)

!wget "https://docs.google.com/uc?export=download&id=1GlqcNJhGGbwLKjJK1QJ_nyswCTQ2K2Fq" -O ./restaurant_images/fried_chicken.png

# load as image documents
image_documents = SimpleDirectoryReader("./restaurant_images").load_data()

# display image
imageUrl = "./restaurant_images/fried_chicken.png"
image = Image.open(imageUrl).convert("RGB")
plt.figure(figsize=(16, 5))
plt.imshow(image)

<matplotlib.image.AxesImage at 0x3efdae470>

../../_images/1f44897e722a911aa804ad7803bbb94cc3511527dd7b941a0da82823fef6b157.png

from pydantic import BaseModel


class Restaurant(BaseModel):
    """Data model for an restaurant."""

    restaurant: str
    food: str
    discount: str
    price: str
    rating: str
    review: str

from llama_index.core.program import MultiModalLLMCompletionProgram
from llama_index.core.output_parsers import PydanticOutputParser

prompt_template_str = """\
{query_str}

Return the answer as a Pydantic object. The Pydantic schema is given below:

"""
mm_program = MultiModalLLMCompletionProgram.from_defaults(
    output_parser=PydanticOutputParser(Restaurant),
    image_documents=image_documents,
    prompt_template_str=prompt_template_str,
    multi_modal_llm=mm_model,
    verbose=True,
)

response = mm_program(query_str="Can you summarize what is in the image?")
for res in response:
    print(res)

> Raw output:  ```
{
    "restaurant": "Buffalo Wild Wings",
    "food": "8 wings or chicken poppers",
    "discount": "20% discount on orders over $25",
    "price": "$8.73 each",
    "rating": "",
    "review": ""
}
```
('restaurant', 'Buffalo Wild Wings')
('food', '8 wings or chicken poppers')
('discount', '20% discount on orders over $25')
('price', '$8.73 each')
('rating', '')
('review', '')

Retrieval-Augmented Image Captioning#

Here we show a simple example of a retrieval-augmented image captioning pipeline, expressed via our query pipeline syntax.

!wget "https://www.dropbox.com/scl/fi/mlaymdy1ni1ovyeykhhuk/tesla_2021_10k.htm?rlkey=qf9k4zn0ejrbm716j0gg7r802&dl=1" -O tesla_2021_10k.htm
!wget "https://docs.google.com/uc?export=download&id=1THe1qqM61lretr9N3BmINc_NWDvuthYf" -O shanghai.jpg

# from llama_index import SimpleDirectoryReader
from pathlib import Path
from llama_index.readers.file import UnstructuredReader
from llama_index.core.schema import ImageDocument


loader = UnstructuredReader()
documents = loader.load_data(file=Path("tesla_2021_10k.htm"))

image_doc = ImageDocument(image_path="./shanghai.jpg")

from llama_index.core import VectorStoreIndex
from llama_index.core.embeddings import resolve_embed_model

embed_model = resolve_embed_model("local:BAAI/bge-m3")
vector_index = VectorStoreIndex.from_documents(
    documents, embed_model=embed_model
)
query_engine = vector_index.as_query_engine()

from llama_index.core.prompts import PromptTemplate
from llama_index.core.query_pipeline import QueryPipeline, FnComponent

query_prompt_str = """\
Please expand the initial statement using the provided context from the Tesla 10K report.

{initial_statement}

"""
query_prompt_tmpl = PromptTemplate(query_prompt_str)

# MM model --> query prompt --> query engine
qp = QueryPipeline(
    modules={
        "mm_model": mm_model.as_query_component(
            partial={"image_documents": [image_doc]}
        ),
        "query_prompt": query_prompt_tmpl,
        "query_engine": query_engine,
    },
    verbose=True,
)
qp.add_chain(["mm_model", "query_prompt", "query_engine"])
rag_response = qp.run("Which Tesla Factory is shown in the image?")

> Running module mm_model with input: 
prompt: Which Tesla Factory is shown in the image?

> Running module query_prompt with input: 
initial_statement:  The image you've provided is a photograph of the Tesla Gigafactory, which is located in Shanghai, China. This facility is one of Tesla's large-scale production plants and is used for manufacturing el...

> Running module query_engine with input: 
input: Please expand the initial statement using the provided context from the Tesla 10K report.

 The image you've provided is a photograph of the Tesla Gigafactory, which is located in Shanghai, China. Thi...

print(f"> Retrieval Augmented Response: {rag_response}")

> Retrieval Augmented Response: The Gigafactory Shanghai in China is an important manufacturing facility for Tesla. It was established to increase the affordability of Tesla vehicles for customers in local markets by reducing transportation and manufacturing costs and eliminating the impact of unfavorable tariffs. The factory allows Tesla to access high volumes of lithium-ion battery cells manufactured by their partner Panasonic, while achieving a significant reduction in the cost of their battery packs. Tesla continues to invest in Gigafactory Shanghai to achieve additional output. This factory is representative of Tesla's plan to improve their manufacturing operations as they establish new factories, incorporating the learnings from their previous ramp-ups.

rag_response.source_nodes[1].get_content()

'For example, we are currently constructing Gigafactory Berlin under conditional permits in anticipation of being granted final permits. Moreover, we will have to establish and ramp production of our proprietary battery cells and packs at our new factories, and we additionally intend to incorporate sequential design and manufacturing changes into vehicles manufactured at each new factory. We have limited experience to date with developing and implementing manufacturing innovations outside of the Fremont Factory and Gigafactory Shanghai. In particular, the majority of our design and engineering resources are currently located in California. In order to meet our expectations for our new factories, we must expand and manage localized design and engineering talent and resources. If we experience any issues or delays in meeting our projected timelines, costs, capital efficiency and production capacity for our new factories, expanding and managing teams to implement iterative design and production changes there, maintaining and complying with the terms of any debt financing that we obtain to fund them or generating and maintaining demand for the vehicles we manufacture there, our business, prospects, operating results and financial condition may be harmed.\n\nWe will need to maintain and significantly grow our access to battery cells, including through the development and manufacture of our own cells, and control our related costs.\n\nWe are dependent on the continued supply of lithium-ion battery cells for our vehicles and energy storage products, and we will require substantially more cells to grow our business according to our plans. Currently, we rely on suppliers such as Panasonic and Contemporary Amperex Technology Co. Limited (CATL) for these cells. We have to date fully qualified only a very limited number\n\n16\n\nof such suppliers and have limited flexibility in changing suppliers. Any disruption in the supply of battery cells from our suppliers could limit production of our vehicles and energy storage products. In the long term, we intend to supplement cells from our suppliers with cells manufactured by us, which we believe will be more efficient, manufacturable at greater volumes and more cost-effective than currently available cells. However, our efforts to develop and manufacture such battery cells have required, and may continue to require, significant investments, and there can be no assurance that we will be able to achieve these targets in the timeframes that we have planned or at all. If we are unable to do so, we may have to curtail our planned vehicle and energy storage product production or procure additional cells from suppliers at potentially greater costs, either of which may harm our business and operating results.\n\nIn addition, the cost of battery cells, whether manufactured by our suppliers or by us, depends in part upon the prices and availability of raw materials such as lithium, nickel, cobalt and/or other metals. The prices for these materials fluctuate and their available supply may be unstable, depending on market conditions and global demand for these materials, including as a result of increased global production of electric vehicles and energy storage products. Any reduced availability of these materials may impact our access to cells and any increases in their prices may reduce our profitability if we cannot recoup the increased costs through increased vehicle prices. Moreover, any such attempts to increase product prices may harm our brand, prospects and operating results.\n\nWe face strong competition for our products and services from a growing list of established and new competitors.\n\nThe worldwide automotive market is highly competitive today and we expect it will become even more so in the future. For example, Model 3 and Model Y face competition from existing and future automobile manufacturers in the extremely competitive entry-level premium sedan and compact SUV markets. A significant and growing number of established and new automobile manufacturers, as well as other companies, have entered, or are reported to have plans to enter, the market for electric and other alternative fuel vehicles, including hybrid, plug-in hybrid and fully electric vehicles, as well as the market for self-driving technology and other vehicle applications and software platforms. In some cases, our competitors offer or will offer electric vehicles in important markets such as China and Europe, and/or have announced an intention to produce electric vehicles exclusively at some point in the future. Many of our competitors have significantly greater or better-established resources than we do to devote to the design, development, manufacturing, distribution, promotion, sale and support of their products. Increased competition could result in our lower vehicle unit sales, price reductions, revenue shortfalls, loss of customers and loss of market share, which may harm our business, financial condition and operating results.\n\nWe also face competition in our energy generation and storage business from other manufacturers, developers, installers and service providers of competing energy technologies, as well as from large utilities. Decreases in the retail or wholesale prices of electricity from utilities or other renewable energy sources could make our products less attractive to customers and lead to an increased rate of residential customer defaults under our existing long-term leases and PPAs.'