Optimized BGE Embedding Model using Intel® Extension for Transformers¶

LlamaIndex has support for loading quantized BGE embedding models generated by Intel® Extension for Transformers (ITREX) and use ITREX Neural Engine, a high-performance NLP backend, to accelerate the inference of models without compromising accuracy.

Refer to our blog of Efficient Natural Language Embedding Models with Intel Extension for Transformers and BGE optimization example for more details."

In order to be able to load and use the quantized models, install the required dependency pip install intel-extension-for-transformers torch accelerate datasets onnx.

Loading is done using the class ItrexQuantizedBgeEmbedding; usage is similar to any HuggingFace local embedding model; See example:

In [ ]:

Copied!

%pip install llama-index-embeddings-huggingface-itrex
%pip install llama-index-embeddings-huggingface-itrex

In [ ]:

Copied!

from llama_index.embeddings.huggingface_itrex import ItrexQuantizedBgeEmbedding

embed_model = ItrexQuantizedBgeEmbedding(
    "Intel/bge-small-en-v1.5-sts-int8-static-inc"
)
from llama_index.embeddings.huggingface_itrex import ItrexQuantizedBgeEmbedding

embed_model = ItrexQuantizedBgeEmbedding(
    "Intel/bge-small-en-v1.5-sts-int8-static-inc"
)

/home/yuwenzho/.conda/envs/yuwen/lib/python3.9/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
2024-03-29 15:40:42 [INFO] Start to extarct onnx model ops...
2024-03-29 15:40:42 [INFO] Extract onnxruntime model done...
2024-03-29 15:40:42 [INFO] Start to implement Sub-Graph matching and replacing...
2024-03-29 15:40:43 [INFO] Sub-Graph match and replace done...

In [ ]:

Copied!

embeddings = embed_model.get_text_embedding("Hello World!")
print(len(embeddings))
print(embeddings[:5])
embeddings = embed_model.get_text_embedding("Hello World!")
print(len(embeddings))
print(embeddings[:5])

384
[-0.005477035418152809, -0.000541043293196708, 0.036467909812927246, -0.04861024394631386, 0.0288068987429142]