Local Embeddings with OpenVINO¶

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference. The OpenVINO™ Runtime supports various hardware devices including x86 and ARM CPUs, and Intel GPUs. It can help to boost deep learning performance in Computer Vision, Automatic Speech Recognition, Natural Language Processing and other common tasks.

Hugging Face embedding model can be supported by OpenVINO through OpenVINOEmbedding class.

If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.

In [ ]:

Copied!

%pip install llama-index-embeddings-openvino
%pip install llama-index-embeddings-openvino

In [ ]:

Copied!

!pip install llama-index
!pip install llama-index

Model Exporter¶

It is possible to export your model to the OpenVINO IR format with create_and_save_openvino_model function, and load the model from local folder.

In [ ]:

Copied!

from llama_index.embeddings.huggingface_openvino import OpenVINOEmbedding

OpenVINOEmbedding.create_and_save_openvino_model(
    "BAAI/bge-small-en-v1.5", "./bge_ov"
)
from llama_index.embeddings.huggingface_openvino import OpenVINOEmbedding

OpenVINOEmbedding.create_and_save_openvino_model(
    "BAAI/bge-small-en-v1.5", "./bge_ov"
)

Model Loading¶

If you have an Intel GPU, you can specify device="gpu" to run inference on it.

In [ ]:

Copied!

ov_embed_model = OpenVINOEmbedding(model_id_or_path="./bge_ov", device="cpu")
ov_embed_model = OpenVINOEmbedding(model_id_or_path="./bge_ov", device="cpu")

Compiling the model to CPU ...

In [ ]:

Copied!

embeddings = ov_embed_model.get_text_embedding("Hello World!")
print(len(embeddings))
print(embeddings[:5])
embeddings = ov_embed_model.get_text_embedding("Hello World!")
print(len(embeddings))
print(embeddings[:5])

384
[-0.003275693394243717, -0.011690815910696983, 0.04155920818448067, -0.03814816474914551, 0.024183083325624466]

For more information refer to: