Multimodal models with Nebius¶
This notebook demonstrates how to use multimodal models from Nebius AI Studio with LlamaIndex. Nebius AI Studio implements all state-of-the-art multimodal models available for commercial use.
First, let's install LlamaIndex and dependencies of Nebius AI Studio. Since AI Studio uses OpenAI-compatible OpenAI, installation of the OpenAI Multimodal package inside Llama-index is also required.
In [ ]:
Copied!
# %pip install llama-index-multi-modal-llms-openai
%pip install llama-index-multi-modal-llms-nebius
# %pip install llama-index-multi-modal-llms-openai
%pip install llama-index-multi-modal-llms-nebius
ERROR: Could not find a version that satisfies the requirement llama-index-multi-modal-llms-nebius (from versions: none) ERROR: No matching distribution found for llama-index-multi-modal-llms-nebius Note: you may need to restart the kernel to use updated packages.
In [ ]:
Copied!
!pip install llama-index matplotlib
!pip install llama-index matplotlib
Use Qwen to understand Images from URLs¶
In [ ]:
Copied!
import os
NEBIUS_API_KEY = "eyJhbGciOiJIUzI1NiIsImtpZCI6IlV6SXJWd1h0dnprLVRvdzlLZWstc0M1akptWXBvX1VaVkxUZlpnMDRlOFUiLCJ0eXAiOiJKV1QifQ.eyJzdWIiOiJnb29nbGUtb2F1dGgyfDEwODM2MzQ1NTA5ODQ4NjQ1NzQwOCIsInNjb3BlIjoib3BlbmlkIG9mZmxpbmVfYWNjZXNzIiwiaXNzIjoiYXBpX2tleV9pc3N1ZXIiLCJhdWQiOlsiaHR0cHM6Ly9uZWJpdXMtaW5mZXJlbmNlLmV1LmF1dGgwLmNvbS9hcGkvdjIvIl0sImV4cCI6MTg4OTcwNjg0NywidXVpZCI6IjcyNTc1ZmY2LTI1NDctNGJiOS1hMWYyLWVhOWI2N2VjZmRkYSIsIm5hbWUiOiJwb3N0bWFuIiwiZXhwaXJlc19hdCI6IjIwMjktMTEtMThUMTQ6MzQ6MDcrMDAwMCJ9.FWg8zqFF4aQM3LX3nAvDpeBalyx42vYCS6KSiwigt6w" # Your Nebius AI Studio API token here
os.environ["NEBIUS_API_KEY"] = NEBIUS_API_KEY
import os
NEBIUS_API_KEY = "eyJhbGciOiJIUzI1NiIsImtpZCI6IlV6SXJWd1h0dnprLVRvdzlLZWstc0M1akptWXBvX1VaVkxUZlpnMDRlOFUiLCJ0eXAiOiJKV1QifQ.eyJzdWIiOiJnb29nbGUtb2F1dGgyfDEwODM2MzQ1NTA5ODQ4NjQ1NzQwOCIsInNjb3BlIjoib3BlbmlkIG9mZmxpbmVfYWNjZXNzIiwiaXNzIjoiYXBpX2tleV9pc3N1ZXIiLCJhdWQiOlsiaHR0cHM6Ly9uZWJpdXMtaW5mZXJlbmNlLmV1LmF1dGgwLmNvbS9hcGkvdjIvIl0sImV4cCI6MTg4OTcwNjg0NywidXVpZCI6IjcyNTc1ZmY2LTI1NDctNGJiOS1hMWYyLWVhOWI2N2VjZmRkYSIsIm5hbWUiOiJwb3N0bWFuIiwiZXhwaXJlc19hdCI6IjIwMjktMTEtMThUMTQ6MzQ6MDcrMDAwMCJ9.FWg8zqFF4aQM3LX3nAvDpeBalyx42vYCS6KSiwigt6w" # Your Nebius AI Studio API token here
os.environ["NEBIUS_API_KEY"] = NEBIUS_API_KEY
Initialize NebiusMultiModal
and Load Images from URLs¶
In [ ]:
Copied!
from llama_index.multi_modal_llms.nebius import NebiusMultiModal
from llama_index.core.multi_modal_llms.generic_utils import load_image_urls
image_urls = [
"https://res.cloudinary.com/hello-tickets/image/upload/c_limit,f_auto,q_auto,w_1920/v1640835927/o3pfl41q7m5bj8jardk0.jpg",
]
image_documents = load_image_urls(image_urls)
mm_llm = NebiusMultiModal(
model="Qwen/Qwen2-VL-72B-Instruct",
api_key=NEBIUS_API_KEY,
max_new_tokens=300,
)
from llama_index.multi_modal_llms.nebius import NebiusMultiModal
from llama_index.core.multi_modal_llms.generic_utils import load_image_urls
image_urls = [
"https://res.cloudinary.com/hello-tickets/image/upload/c_limit,f_auto,q_auto,w_1920/v1640835927/o3pfl41q7m5bj8jardk0.jpg",
]
image_documents = load_image_urls(image_urls)
mm_llm = NebiusMultiModal(
model="Qwen/Qwen2-VL-72B-Instruct",
api_key=NEBIUS_API_KEY,
max_new_tokens=300,
)
In [ ]:
Copied!
from PIL import Image
import requests
from io import BytesIO
import matplotlib.pyplot as plt
img_response = requests.get(image_urls[0])
print(image_urls[0])
img = Image.open(BytesIO(img_response.content))
plt.imshow(img)
from PIL import Image
import requests
from io import BytesIO
import matplotlib.pyplot as plt
img_response = requests.get(image_urls[0])
print(image_urls[0])
img = Image.open(BytesIO(img_response.content))
plt.imshow(img)
https://res.cloudinary.com/hello-tickets/image/upload/c_limit,f_auto,q_auto,w_1920/v1640835927/o3pfl41q7m5bj8jardk0.jpg
Out[ ]:
<matplotlib.image.AxesImage at 0x7ff0724a3f50>
Complete a prompt with a bunch of images¶
In [ ]:
Copied!
complete_response = mm_llm.complete(
prompt="Describe the images as an alternative text",
image_documents=image_documents,
)
complete_response = mm_llm.complete(
prompt="Describe the images as an alternative text",
image_documents=image_documents,
)
In [ ]:
Copied!
print(complete_response)
print(complete_response)
The image depicts the Colosseum in Rome, illuminated with the colors of the Italian flag. The structure is lit up with green, white, and red lights, creating a vibrant display against the night sky. The iconic arches and columns of the Colosseum are clearly visible, showcasing its ancient architectural beauty. The surrounding area is dimly lit, with a few people visible in the foreground, adding a sense of scale to the grandeur of the Colosseum.
Steam Complete a prompt with a bunch of images¶
In [ ]:
Copied!
stream_complete_response = mm_llm.stream_complete(
prompt="give me more context for this image",
image_documents=image_documents,
)
stream_complete_response = mm_llm.stream_complete(
prompt="give me more context for this image",
image_documents=image_documents,
)
In [ ]:
Copied!
for r in stream_complete_response:
print(r.delta, end="")
for r in stream_complete_response:
print(r.delta, end="")
This image depicts the Colosseum in Rome, Italy, illuminated with the colors of the Italian flag: green, white, and red. The Colosseum, also known as the Flavian Amphitheatre, is an iconic ancient Roman structure located in the center of Rome. It was built in the 1st century AD and is one of the most recognizable landmarks in the world. The lighting in this image is likely part of a special event or celebration, such as a national holiday or a cultural festival. The Italian flag colors are often used in such events to show national pride and unity. The Colosseum is a UNESCO World Heritage Site and a major tourist attraction, drawing millions of visitors each year.
Chat through a list of chat messages¶
In [ ]:
Copied!
from llama_index.multi_modal_llms.openai.utils import (
generate_openai_multi_modal_chat_message,
)
chat_msg_1 = generate_openai_multi_modal_chat_message(
prompt="Describe the images as an alternative text",
role="user",
image_documents=image_documents,
)
chat_msg_2 = generate_openai_multi_modal_chat_message(
prompt="The image is a graph showing the surge in US mortgage rates. It is a visual representation of data, with a title at the top and labels for the x and y-axes. Unfortunately, without seeing the image, I cannot provide specific details about the data or the exact design of the graph.",
role="assistant",
)
chat_msg_3 = generate_openai_multi_modal_chat_message(
prompt="can I know more?",
role="user",
)
chat_messages = [chat_msg_1, chat_msg_2, chat_msg_3]
chat_response = mm_llm.chat(
# prompt="Describe the images as an alternative text",
messages=chat_messages,
)
from llama_index.multi_modal_llms.openai.utils import (
generate_openai_multi_modal_chat_message,
)
chat_msg_1 = generate_openai_multi_modal_chat_message(
prompt="Describe the images as an alternative text",
role="user",
image_documents=image_documents,
)
chat_msg_2 = generate_openai_multi_modal_chat_message(
prompt="The image is a graph showing the surge in US mortgage rates. It is a visual representation of data, with a title at the top and labels for the x and y-axes. Unfortunately, without seeing the image, I cannot provide specific details about the data or the exact design of the graph.",
role="assistant",
)
chat_msg_3 = generate_openai_multi_modal_chat_message(
prompt="can I know more?",
role="user",
)
chat_messages = [chat_msg_1, chat_msg_2, chat_msg_3]
chat_response = mm_llm.chat(
# prompt="Describe the images as an alternative text",
messages=chat_messages,
)
In [ ]:
Copied!
for msg in chat_messages:
print(msg.role, msg.content)
for msg in chat_messages:
print(msg.role, msg.content)
MessageRole.USER [{'type': 'text', 'text': 'Describe the images as an alternative text'}, {'type': 'image_url', 'image_url': {'url': 'https://res.cloudinary.com/hello-tickets/image/upload/c_limit,f_auto,q_auto,w_1920/v1640835927/o3pfl41q7m5bj8jardk0.jpg', 'detail': 'low'}}] MessageRole.ASSISTANT The image is a graph showing the surge in US mortgage rates. It is a visual representation of data, with a title at the top and labels for the x and y-axes. Unfortunately, without seeing the image, I cannot provide specific details about the data or the exact design of the graph. MessageRole.USER can I know more?
In [ ]:
Copied!
print(chat_response)
print(chat_response)
assistant: The image is a graph that illustrates the increase in US mortgage rates. It is a visual representation of data, with a title at the top and labels for the x and y-axes. The graph likely shows trends over time, with the x-axis representing time periods and the y-axis representing mortgage rates. The data points on the graph indicate the specific mortgage rates at different times. The graph may also include lines or bars to show the changes in rates more clearly.
Stream Chat through a list of chat messages¶
In [ ]:
Copied!
stream_chat_response = mm_llm.stream_chat(
messages=chat_messages,
)
stream_chat_response = mm_llm.stream_chat(
messages=chat_messages,
)
In [ ]:
Copied!
for r in stream_chat_response:
print(r.delta, end="")
for r in stream_chat_response:
print(r.delta, end="")
The image is a graph that illustrates the increase in US mortgage rates. It is a visual representation of data, with a title at the top and labels for the x and y-axes. The graph likely shows a trend over time, with the x-axis representing time periods (such as months or years) and the y-axis representing mortgage rates. The data points on the graph indicate the mortgage rates at different points in time, showing how they have changed. The graph may also include a line or curve to connect the data points, making it easier to see the overall trend.
Async Complete¶
In [ ]:
Copied!
response_acomplete = await mm_llm.acomplete(
prompt="Describe the images as an alternative text",
image_documents=image_documents,
)
response_acomplete = await mm_llm.acomplete(
prompt="Describe the images as an alternative text",
image_documents=image_documents,
)
In [ ]:
Copied!
print(response_acomplete)
print(response_acomplete)
The image depicts the Colosseum in Rome, illuminated with the colors of the Italian flag. The structure is lit up with green, white, and red lights, creating a vibrant display against the night sky. The iconic arches and columns of the Colosseum are clearly visible, showcasing its ancient architectural beauty. The scene is set during the evening or night, with a few people visible in the foreground, adding a sense of scale to the grandeur of the monument.
Async Steam Complete¶
In [ ]:
Copied!
response_astream_complete = await mm_llm.astream_complete(
prompt="Describe the images as an alternative text",
image_documents=image_documents,
)
response_astream_complete = await mm_llm.astream_complete(
prompt="Describe the images as an alternative text",
image_documents=image_documents,
)
In [ ]:
Copied!
async for delta in response_astream_complete:
print(delta.delta, end="")
async for delta in response_astream_complete:
print(delta.delta, end="")
The image depicts the Colosseum in Rome, illuminated with the colors of the Italian flag: green, white, and red. The lighting highlights the architectural details of the ancient amphitheater, casting a vibrant glow on its arches and columns. The scene is set against a dark, twilight sky, with a few clouds visible. The surrounding area is dimly lit, with some construction or renovation work visible at the base of the Colosseum.
Async Chat¶
In [ ]:
Copied!
achat_response = await mm_llm.achat(
messages=chat_messages,
)
achat_response = await mm_llm.achat(
messages=chat_messages,
)
In [ ]:
Copied!
print(achat_response)
print(achat_response)
assistant: The image is a graph that illustrates the increase in US mortgage rates. It is a visual representation of data, with a title at the top and labels for the x and y-axes. The graph likely includes a line or bar chart that shows the trend of mortgage rates over a certain period. The x-axis probably represents time (e.g., months or years), while the y-axis represents the mortgage rates. The title and axis labels provide context and help interpret the data.
Async stream Chat¶
In [ ]:
Copied!
astream_chat_response = await mm_llm.astream_chat(
messages=chat_messages,
)
astream_chat_response = await mm_llm.astream_chat(
messages=chat_messages,
)
In [ ]:
Copied!
async for delta in astream_chat_response:
print(delta.delta, end="")
async for delta in astream_chat_response:
print(delta.delta, end="")
The image is a graph that illustrates the increase in US mortgage rates. It is a visual representation of data, with a title at the top and labels for the x and y-axes. The graph likely shows a trend over time, with the x-axis representing time periods (such as months or years) and the y-axis representing mortgage rates. The data points on the graph indicate the mortgage rates at different points in time, showing how they have changed. The graph may also include a trend line or other visual elements to help illustrate the overall trend.
Use Qwen to understand images from local files¶
In [ ]:
Copied!
from llama_index.core import SimpleDirectoryReader
# put your local directory here
image_documents = SimpleDirectoryReader(
"/mnt/share/nebius/images/"
).load_data()
response = mm_llm.complete(
prompt="Describe the images as an alternative text",
image_documents=image_documents,
)
from llama_index.core import SimpleDirectoryReader
# put your local directory here
image_documents = SimpleDirectoryReader(
"/mnt/share/nebius/images/"
).load_data()
response = mm_llm.complete(
prompt="Describe the images as an alternative text",
image_documents=image_documents,
)
In [ ]:
Copied!
from PIL import Image
import matplotlib.pyplot as plt
img = Image.open("/mnt/share/nebius/images/walle-image-1-disney.jpg")
plt.imshow(img)
from PIL import Image
import matplotlib.pyplot as plt
img = Image.open("/mnt/share/nebius/images/walle-image-1-disney.jpg")
plt.imshow(img)
Out[ ]:
<matplotlib.image.AxesImage at 0x7ff05e840fd0>
In [ ]:
Copied!
print(response)
print(response)
The image features two animated robots standing behind a metal fence. The robot on the left is a small, boxy, and rusty-looking machine with large, expressive eyes and a yellow body. The robot on the right is larger, with a smooth, white exterior and a green plant symbol on its chest. This robot is adorned with a string of colorful Christmas lights. The background appears to be a desolate, post-apocalyptic environment with dim lighting and a sense of abandonment.