Finetuning#
Finetuning modules.
- class llama_index.finetuning.CohereRerankerFinetuneEngine(train_file_name: str = 'train.jsonl', val_file_name: Optional[str] = None, model_name: str = 'exp_finetune', model_type: str = 'RERANK', base_model: str = 'english', api_key: Optional[str] = None)#
Cohere Reranker Finetune Engine.
- finetune() None #
Finetune model.
- get_finetuned_model(top_n: int = 5) CohereRerank #
Gets finetuned model id.
- class llama_index.finetuning.EmbeddingAdapterFinetuneEngine(dataset: EmbeddingQAFinetuneDataset, embed_model: BaseEmbedding, batch_size: int = 10, epochs: int = 1, adapter_model: Optional[Any] = None, dim: Optional[int] = None, device: Optional[str] = None, model_output_path: str = 'model_output', model_checkpoint_path: Optional[str] = None, checkpoint_save_steps: int = 100, verbose: bool = False, bias: bool = False, **train_kwargs: Any)#
Embedding adapter finetune engine.
- Parameters
dataset (EmbeddingQAFinetuneDataset) โ Dataset to finetune on.
embed_model (BaseEmbedding) โ Embedding model to finetune.
batch_size (Optional[int]) โ Batch size. Defaults to 10.
epochs (Optional[int]) โ Number of epochs. Defaults to 1.
dim (Optional[int]) โ Dimension of embedding. Defaults to None.
adapter_model (Optional[BaseAdapter]) โ Adapter model. Defaults to None, in which case a linear adapter is used.
device (Optional[str]) โ Device to use. Defaults to None.
model_output_path (str) โ Path to save model output. Defaults to โmodel_outputโ.
model_checkpoint_path (Optional[str]) โ Path to save model checkpoints. Defaults to None (donโt save checkpoints).
verbose (bool) โ Whether to show progress bar. Defaults to False.
bias (bool) โ Whether to use bias. Defaults to False.
- finetune(**train_kwargs: Any) None #
Finetune.
- classmethod from_model_path(dataset: EmbeddingQAFinetuneDataset, embed_model: BaseEmbedding, model_path: str, model_cls: Optional[Type[Any]] = None, **kwargs: Any) EmbeddingAdapterFinetuneEngine #
Load from model path.
- Parameters
dataset (EmbeddingQAFinetuneDataset) โ Dataset to finetune on.
embed_model (BaseEmbedding) โ Embedding model to finetune.
model_path (str) โ Path to model.
model_cls (Optional[Type[Any]]) โ Adapter model class. Defaults to None.
**kwargs (Any) โ Additional kwargs (see __init__)
- get_finetuned_model(**model_kwargs: Any) BaseEmbedding #
Get finetuned model.
- smart_batching_collate(batch: List) Tuple[Any, Any] #
Smart batching collate.
- pydantic model llama_index.finetuning.EmbeddingQAFinetuneDataset#
Embedding QA Finetuning Dataset.
- Parameters
queries (Dict[str, str]) โ Dict id -> query.
corpus (Dict[str, str]) โ Dict id -> string.
relevant_docs (Dict[str, List[str]]) โ Dict query id -> list of doc ids.
Show JSON schema
{ "title": "EmbeddingQAFinetuneDataset", "description": "Embedding QA Finetuning Dataset.\n\nArgs:\n queries (Dict[str, str]): Dict id -> query.\n corpus (Dict[str, str]): Dict id -> string.\n relevant_docs (Dict[str, List[str]]): Dict query id -> list of doc ids.", "type": "object", "properties": { "queries": { "title": "Queries", "type": "object", "additionalProperties": { "type": "string" } }, "corpus": { "title": "Corpus", "type": "object", "additionalProperties": { "type": "string" } }, "relevant_docs": { "title": "Relevant Docs", "type": "object", "additionalProperties": { "type": "array", "items": { "type": "string" } } }, "mode": { "title": "Mode", "default": "text", "type": "string" } }, "required": [ "queries", "corpus", "relevant_docs" ] }
- Fields
corpus (Dict[str, str])
mode (str)
queries (Dict[str, str])
relevant_docs (Dict[str, List[str]])
- field corpus: Dict[str, str] [Required]#
- field mode: str = 'text'#
- field queries: Dict[str, str] [Required]#
- field relevant_docs: Dict[str, List[str]] [Required]#
- classmethod construct(_fields_set: Optional[SetStr] = None, **values: Any) Model #
Creates a new model setting __dict__ and __fields_set__ from trusted or pre-validated data. Default values are respected, but no other validation is performed. Behaves as if Config.extra = โallowโ was set since it adds all passed values
- copy(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, update: Optional[DictStrAny] = None, deep: bool = False) Model #
Duplicate a model, optionally choose which fields to include, exclude and change.
- Parameters
include โ fields to include in new model
exclude โ fields to exclude from new model, as with values this takes precedence over include
update โ values to change/add in the new model. Note: the data is not validated before creating the new model: you should trust this data
deep โ set to True to make a deep copy of the model
- Returns
new model instance
- dict(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, by_alias: bool = False, skip_defaults: Optional[bool] = None, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False) DictStrAny #
Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.
- classmethod from_json(path: str) EmbeddingQAFinetuneDataset #
Load json.
- classmethod from_orm(obj: Any) Model #
- json(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, by_alias: bool = False, skip_defaults: Optional[bool] = None, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False, encoder: Optional[Callable[[Any], Any]] = None, models_as_dict: bool = True, **dumps_kwargs: Any) unicode #
Generate a JSON representation of the model, include and exclude arguments as per dict().
encoder is an optional function to supply as default to json.dumps(), other arguments as per json.dumps().
- classmethod parse_file(path: Union[str, Path], *, content_type: unicode = None, encoding: unicode = 'utf8', proto: Protocol = None, allow_pickle: bool = False) Model #
- classmethod parse_obj(obj: Any) Model #
- classmethod parse_raw(b: Union[str, bytes], *, content_type: unicode = None, encoding: unicode = 'utf8', proto: Protocol = None, allow_pickle: bool = False) Model #
- save_json(path: str) None #
Save json.
- classmethod schema(by_alias: bool = True, ref_template: unicode = '#/definitions/{model}') DictStrAny #
- classmethod schema_json(*, by_alias: bool = True, ref_template: unicode = '#/definitions/{model}', **dumps_kwargs: Any) unicode #
- classmethod update_forward_refs(**localns: Any) None #
Try to update ForwardRefs on fields based on this Model, globalns and localns.
- classmethod validate(value: Any) Model #
- property query_docid_pairs: List[Tuple[str, List[str]]]#
Get query, relevant doc ids.
- class llama_index.finetuning.GradientFinetuneEngine(*, access_token: Optional[str] = None, base_model_slug: str, data_path: str, host: Optional[str] = None, learning_rate: Optional[float] = None, name: str, rank: Optional[int] = None, workspace_id: Optional[str] = None)#
- class llama_index.finetuning.GradientFinetuneEngine(*, access_token: Optional[str] = None, data_path: str, host: Optional[str] = None, model_adapter_id: str, workspace_id: Optional[str] = None)
- finetune() None #
Goes off and does stuff.
- get_finetuned_model(**model_kwargs: Any) GradientModelAdapterLLM #
Gets finetuned model.
- class llama_index.finetuning.OpenAIFinetuneEngine(base_model: str, data_path: str, verbose: bool = False, start_job_id: Optional[str] = None, validate_json: bool = True)#
OpenAI Finetuning Engine.
- finetune() None #
Finetune model.
- classmethod from_finetuning_handler(finetuning_handler: OpenAIFineTuningHandler, base_model: str, data_path: str, **kwargs: Any) OpenAIFinetuneEngine #
Initialize from finetuning handler.
Used to finetune an OpenAI model into another OpenAI model (e.g. gpt-3.5-turbo on top of GPT-4).
- get_current_job() FineTuningJob #
Get current job.
- get_finetuned_model(**model_kwargs: Any) LLM #
Gets finetuned model.
- class llama_index.finetuning.SentenceTransformersFinetuneEngine(dataset: EmbeddingQAFinetuneDataset, model_id: str = 'BAAI/bge-small-en', model_output_path: str = 'exp_finetune', batch_size: int = 10, val_dataset: Optional[EmbeddingQAFinetuneDataset] = None, loss: Optional[Any] = None, epochs: int = 2, show_progress_bar: bool = True, evaluation_steps: int = 50, use_all_docs: bool = False)#
Sentence Transformers Finetune Engine.
- finetune(**train_kwargs: Any) None #
Finetune model.
- get_finetuned_model(**model_kwargs: Any) BaseEmbedding #
Gets finetuned model.
- llama_index.finetuning.generate_qa_embedding_pairs(nodes: List[TextNode], llm: LLM, qa_generate_prompt_tmpl: str = 'Context information is below.\n\n---------------------\n{context_str}\n---------------------\n\nGiven the context information and not prior knowledge.\ngenerate only questions based on the below query.\n\nYou are a Teacher/ Professor. Your task is to setup {num_questions_per_chunk} questions for an upcoming quiz/examination. The questions should be diverse in nature across the document. Restrict the questions to the context information provided."\n', num_questions_per_chunk: int = 2) EmbeddingQAFinetuneDataset #
Generate examples given a set of nodes.