SemanticSplitterNodeParser#
- pydantic model llama_index.core.node_parser.SemanticSplitterNodeParser#
Semantic node parser.
Splits a document into Nodes, with each node being a group of semantically related sentences.
- Parameters
buffer_size (int) – number of sentences to group together when evaluating semantic similarity
embed_model – (BaseEmbedding): embedding model to use
sentence_splitter (Optional[Callable]) – splits text into sentences
include_metadata (bool) – whether to include metadata in nodes
include_prev_next_rel (bool) – whether to include prev/next relationships
Show JSON schema
{ "title": "SemanticSplitterNodeParser", "description": "Semantic node parser.\n\nSplits a document into Nodes, with each node being a group of semantically related sentences.\n\nArgs:\n buffer_size (int): number of sentences to group together when evaluating semantic similarity\n embed_model: (BaseEmbedding): embedding model to use\n sentence_splitter (Optional[Callable]): splits text into sentences\n include_metadata (bool): whether to include metadata in nodes\n include_prev_next_rel (bool): whether to include prev/next relationships", "type": "object", "properties": { "include_metadata": { "title": "Include Metadata", "description": "Whether or not to consider metadata when splitting.", "default": true, "type": "boolean" }, "include_prev_next_rel": { "title": "Include Prev Next Rel", "description": "Include prev/next node relationships.", "default": true, "type": "boolean" }, "callback_manager": { "title": "Callback Manager", "type": "object", "default": {} }, "embed_model": { "title": "Embed Model", "description": "The embedding model to use to for semantic comparison", "allOf": [ { "$ref": "#/definitions/BaseEmbedding" } ] }, "buffer_size": { "title": "Buffer Size", "description": "The number of sentences to group together when evaluating semantic similarity. Set to 1 to consider each sentence individually. Set to >1 to group sentences together.", "default": 1, "type": "integer" }, "breakpoint_percentile_threshold": { "title": "Breakpoint Percentile Threshold", "description": "The percentile of cosine dissimilarity that must be exceeded between a group of sentences and the next to form a node. The smaller this number is, the more nodes will be generated", "default": 95, "type": "integer" }, "class_name": { "title": "Class Name", "type": "string", "default": "SemanticSplitterNodeParser" } }, "required": [ "embed_model" ], "definitions": { "BaseEmbedding": { "title": "BaseEmbedding", "description": "Base class for embeddings.", "type": "object", "properties": { "model_name": { "title": "Model Name", "description": "The name of the embedding model.", "default": "unknown", "type": "string" }, "embed_batch_size": { "title": "Embed Batch Size", "description": "The batch size for embedding calls.", "default": 10, "exclusiveMinimum": 0, "lte": 2048, "type": "integer" }, "callback_manager": { "title": "Callback Manager", "type": "object", "default": {} }, "class_name": { "title": "Class Name", "type": "string", "default": "base_component" } } } } }
- Config
arbitrary_types_allowed: bool = True
- Fields
buffer_size (int)
embed_model (llama_index.core.base.embeddings.base.BaseEmbedding)
sentence_splitter (Callable[[str], List[str]])
- Validators
_validate_id_func
»id_func
- field buffer_size: int = 1#
The number of sentences to group together when evaluating semantic similarity. Set to 1 to consider each sentence individually. Set to >1 to group sentences together.
- field embed_model: BaseEmbedding [Required]#
The embedding model to use to for semantic comparison
- field sentence_splitter: Callable[[str], List[str]] [Optional]#
The text splitter to use when splitting documents.
- build_semantic_nodes_from_documents(documents: Sequence[Document], show_progress: bool = False) List[BaseNode] #
Build window nodes from documents.
- classmethod class_name() str #
Get the class name, used as a unique ID in serialization.
This provides a key that makes serialization robust against actual class name changes.
- classmethod from_defaults(embed_model: Optional[BaseEmbedding] = None, breakpoint_percentile_threshold: Optional[int] = 95, buffer_size: Optional[int] = 1, sentence_splitter: Optional[Callable[[str], List[str]]] = None, original_text_metadata_key: str = 'original_text', include_metadata: bool = True, include_prev_next_rel: bool = True, callback_manager: Optional[CallbackManager] = None, id_func: Optional[Callable[[int, Document], str]] = None) SemanticSplitterNodeParser #