Embeddings

Users have a few options to choose from when it comes to embeddings.

  • OpenAIEmbedding: the default embedding class. Defaults to β€œtext-embedding-ada-002”

  • HuggingFaceEmbedding: a generic wrapper around HuggingFace’s transformers models.

  • OptimumEmbedding: support for usage and creation of ONNX models from Optimum and HuggingFace.

  • InstructorEmbedding: a wrapper around Instructor embedding models.

  • LangchainEmbedding: a wrapper around Langchain’s embedding models.

  • GoogleUnivSentEncoderEmbedding: a wrapper around Google’s Universal Sentence Encoder.

  • AdapterEmbeddingModel: an adapter around any embedding model.

OpenAIEmbedding

pydantic model llama_index.embeddings.openai.OpenAIEmbedding

OpenAI class for embeddings.

Parameters
  • mode (str) –

    Mode for embedding. Defaults to OpenAIEmbeddingMode.TEXT_SEARCH_MODE. Options are:

    • OpenAIEmbeddingMode.SIMILARITY_MODE

    • OpenAIEmbeddingMode.TEXT_SEARCH_MODE

  • model (str) –

    Model for embedding. Defaults to OpenAIEmbeddingModelType.TEXT_EMBED_ADA_002. Options are:

    • OpenAIEmbeddingModelType.DAVINCI

    • OpenAIEmbeddingModelType.CURIE

    • OpenAIEmbeddingModelType.BABBAGE

    • OpenAIEmbeddingModelType.ADA

    • OpenAIEmbeddingModelType.TEXT_EMBED_ADA_002

  • deployment_name (Optional[str]) – Optional deployment of model. Defaults to None. If this value is not None, mode and model will be ignored. Only available for using AzureOpenAI.

Show JSON schema
{
   "title": "OpenAIEmbedding",
   "description": "OpenAI class for embeddings.\n\nArgs:\n    mode (str): Mode for embedding.\n        Defaults to OpenAIEmbeddingMode.TEXT_SEARCH_MODE.\n        Options are:\n\n        - OpenAIEmbeddingMode.SIMILARITY_MODE\n        - OpenAIEmbeddingMode.TEXT_SEARCH_MODE\n\n    model (str): Model for embedding.\n        Defaults to OpenAIEmbeddingModelType.TEXT_EMBED_ADA_002.\n        Options are:\n\n        - OpenAIEmbeddingModelType.DAVINCI\n        - OpenAIEmbeddingModelType.CURIE\n        - OpenAIEmbeddingModelType.BABBAGE\n        - OpenAIEmbeddingModelType.ADA\n        - OpenAIEmbeddingModelType.TEXT_EMBED_ADA_002\n\n    deployment_name (Optional[str]): Optional deployment of model. Defaults to None.\n        If this value is not None, mode and model will be ignored.\n        Only available for using AzureOpenAI.",
   "type": "object",
   "properties": {
      "model_name": {
         "title": "Model Name",
         "description": "The name of the embedding model.",
         "default": "unknown",
         "type": "string"
      },
      "embed_batch_size": {
         "title": "Embed Batch Size",
         "description": "The batch size for embedding calls.",
         "default": 10,
         "type": "integer"
      },
      "callback_manager": {
         "title": "Callback Manager"
      },
      "deployment_name": {
         "title": "Deployment Name",
         "type": "string"
      },
      "additional_kwargs": {
         "title": "Additional Kwargs",
         "description": "Additional kwargs for the OpenAI API.",
         "type": "object"
      },
      "api_key": {
         "title": "Api Key",
         "description": "The OpenAI API key.",
         "type": "string"
      },
      "api_type": {
         "title": "Api Type",
         "description": "The OpenAI API type.",
         "type": "string"
      },
      "api_base": {
         "title": "Api Base",
         "description": "The base URL for OpenAI API.",
         "type": "string"
      },
      "api_version": {
         "title": "Api Version",
         "description": "The API version for OpenAI API.",
         "type": "string"
      }
   },
   "required": [
      "api_base",
      "api_version"
   ]
}

Config
  • arbitrary_types_allowed: bool = True

Fields
Validators
  • _validate_callback_manager Β» callback_manager

field additional_kwargs: Dict[str, Any] [Optional]

Additional kwargs for the OpenAI API.

field api_base: str [Required]

The base URL for OpenAI API.

field api_key: str = None

The OpenAI API key.

field api_type: str = None

The OpenAI API type.

field api_version: str [Required]

The API version for OpenAI API.

field deployment_name: Optional[str] = None
classmethod class_name() str

Get the class name, used as a unique ID in serialization.

This provides a key that makes serialization robust against actual class name changes.

HuggingFaceEmbedding

pydantic model llama_index.embeddings.huggingface.HuggingFaceEmbedding

Show JSON schema
{
   "title": "HuggingFaceEmbedding",
   "description": "Base class for embeddings.",
   "type": "object",
   "properties": {
      "model_name": {
         "title": "Model Name",
         "description": "The name of the embedding model.",
         "default": "unknown",
         "type": "string"
      },
      "embed_batch_size": {
         "title": "Embed Batch Size",
         "description": "The batch size for embedding calls.",
         "default": 10,
         "type": "integer"
      },
      "callback_manager": {
         "title": "Callback Manager"
      },
      "tokenizer_name": {
         "title": "Tokenizer Name",
         "description": "Tokenizer name from HuggingFace.",
         "type": "string"
      },
      "max_length": {
         "title": "Max Length",
         "description": "Maximum length of input.",
         "type": "integer"
      },
      "pooling": {
         "title": "Pooling",
         "description": "Pooling strategy. One of ['cls', 'mean'].",
         "type": "string"
      },
      "query_instruction": {
         "title": "Query Instruction",
         "description": "Instruction to prepend to query text.",
         "type": "string"
      },
      "text_instruction": {
         "title": "Text Instruction",
         "description": "Instruction to prepend to text.",
         "type": "string"
      },
      "cache_folder": {
         "title": "Cache Folder",
         "description": "Cache folder for huggingface files.",
         "type": "string"
      }
   },
   "required": [
      "tokenizer_name",
      "max_length",
      "pooling"
   ]
}

Config
  • arbitrary_types_allowed: bool = True

Fields
Validators
  • _validate_callback_manager Β» callback_manager

field cache_folder: Optional[str] = None

Cache folder for huggingface files.

field max_length: int [Required]

Maximum length of input.

field pooling: str [Required]

Pooling strategy. One of [β€˜cls’, β€˜mean’].

field query_instruction: Optional[str] = None

Instruction to prepend to query text.

field text_instruction: Optional[str] = None

Instruction to prepend to text.

field tokenizer_name: str [Required]

Tokenizer name from HuggingFace.

classmethod class_name() str

Get the class name, used as a unique ID in serialization.

This provides a key that makes serialization robust against actual class name changes.

OptimumEmbedding

pydantic model llama_index.embeddings.huggingface_optimum.OptimumEmbedding

Show JSON schema
{
   "title": "OptimumEmbedding",
   "description": "Base class for embeddings.",
   "type": "object",
   "properties": {
      "model_name": {
         "title": "Model Name",
         "description": "The name of the embedding model.",
         "default": "unknown",
         "type": "string"
      },
      "embed_batch_size": {
         "title": "Embed Batch Size",
         "description": "The batch size for embedding calls.",
         "default": 10,
         "type": "integer"
      },
      "callback_manager": {
         "title": "Callback Manager"
      },
      "folder_name": {
         "title": "Folder Name",
         "description": "Folder name to load from.",
         "type": "string"
      },
      "max_length": {
         "title": "Max Length",
         "description": "Maximum length of input.",
         "type": "integer"
      },
      "pooling": {
         "title": "Pooling",
         "description": "Pooling strategy. One of ['cls', 'mean'].",
         "type": "string"
      },
      "query_instruction": {
         "title": "Query Instruction",
         "description": "Instruction to prepend to query text.",
         "type": "string"
      },
      "text_instruction": {
         "title": "Text Instruction",
         "description": "Instruction to prepend to text.",
         "type": "string"
      },
      "cache_folder": {
         "title": "Cache Folder",
         "description": "Cache folder for huggingface files.",
         "type": "string"
      }
   },
   "required": [
      "folder_name",
      "max_length",
      "pooling"
   ]
}

Config
  • arbitrary_types_allowed: bool = True

Fields
Validators
  • _validate_callback_manager Β» callback_manager

field cache_folder: Optional[str] = None

Cache folder for huggingface files.

field folder_name: str [Required]

Folder name to load from.

field max_length: int [Required]

Maximum length of input.

field pooling: str [Required]

Pooling strategy. One of [β€˜cls’, β€˜mean’].

field query_instruction: Optional[str] = None

Instruction to prepend to query text.

field text_instruction: Optional[str] = None

Instruction to prepend to text.

classmethod class_name() str

Get the class name, used as a unique ID in serialization.

This provides a key that makes serialization robust against actual class name changes.

classmethod create_and_save_optimum_model(model_name_or_path: str, output_path: str, export_kwargs: Optional[dict] = None) None

InstructorEmbedding

pydantic model llama_index.embeddings.instructor.InstructorEmbedding

Show JSON schema
{
   "title": "InstructorEmbedding",
   "description": "Base class for embeddings.",
   "type": "object",
   "properties": {
      "model_name": {
         "title": "Model Name",
         "description": "The name of the embedding model.",
         "default": "unknown",
         "type": "string"
      },
      "embed_batch_size": {
         "title": "Embed Batch Size",
         "description": "The batch size for embedding calls.",
         "default": 10,
         "type": "integer"
      },
      "callback_manager": {
         "title": "Callback Manager"
      },
      "query_instruction": {
         "title": "Query Instruction",
         "description": "Instruction to prepend to query text.",
         "type": "string"
      },
      "text_instruction": {
         "title": "Text Instruction",
         "description": "Instruction to prepend to text.",
         "type": "string"
      },
      "cache_folder": {
         "title": "Cache Folder",
         "description": "Cache folder for huggingface files.",
         "type": "string"
      }
   }
}

Config
  • arbitrary_types_allowed: bool = True

Fields
Validators
  • _validate_callback_manager Β» callback_manager

field cache_folder: Optional[str] = None

Cache folder for huggingface files.

field query_instruction: Optional[str] = None

Instruction to prepend to query text.

field text_instruction: Optional[str] = None

Instruction to prepend to text.

classmethod class_name() str

Get the class name, used as a unique ID in serialization.

This provides a key that makes serialization robust against actual class name changes.

LangchainEmbedding

pydantic model llama_index.embeddings.langchain.LangchainEmbedding

External embeddings (taken from Langchain).

Parameters

langchain_embedding (langchain.embeddings.Embeddings) – Langchain embeddings class.

Show JSON schema
{
   "title": "LangchainEmbedding",
   "description": "External embeddings (taken from Langchain).\n\nArgs:\n    langchain_embedding (langchain.embeddings.Embeddings): Langchain\n        embeddings class.",
   "type": "object",
   "properties": {
      "model_name": {
         "title": "Model Name",
         "description": "The name of the embedding model.",
         "default": "unknown",
         "type": "string"
      },
      "embed_batch_size": {
         "title": "Embed Batch Size",
         "description": "The batch size for embedding calls.",
         "default": 10,
         "type": "integer"
      },
      "callback_manager": {
         "title": "Callback Manager"
      }
   }
}

Config
  • arbitrary_types_allowed: bool = True

Fields

Validators
  • _validate_callback_manager Β» callback_manager

classmethod class_name() str

Get the class name, used as a unique ID in serialization.

This provides a key that makes serialization robust against actual class name changes.

GoogleUnivSentEncoderEmbedding

pydantic model llama_index.embeddings.google.GoogleUnivSentEncoderEmbedding

Show JSON schema
{
   "title": "GoogleUnivSentEncoderEmbedding",
   "description": "Base class for embeddings.",
   "type": "object",
   "properties": {
      "model_name": {
         "title": "Model Name",
         "description": "The name of the embedding model.",
         "default": "unknown",
         "type": "string"
      },
      "embed_batch_size": {
         "title": "Embed Batch Size",
         "description": "The batch size for embedding calls.",
         "default": 10,
         "type": "integer"
      },
      "callback_manager": {
         "title": "Callback Manager"
      }
   }
}

Config
  • arbitrary_types_allowed: bool = True

Fields

Validators
  • _validate_callback_manager Β» callback_manager

classmethod class_name() str

Get the class name, used as a unique ID in serialization.

This provides a key that makes serialization robust against actual class name changes.