UnstructuredElementNodeParser#
- pydantic model llama_index.core.node_parser.UnstructuredElementNodeParser#
Unstructured element node parser.
Splits a document into Text Nodes and Index Nodes corresponding to embedded objects (e.g. tables).
Show JSON schema
{ "title": "UnstructuredElementNodeParser", "description": "Unstructured element node parser.\n\nSplits a document into Text Nodes and Index Nodes corresponding to embedded objects\n(e.g. tables).", "type": "object", "properties": { "include_metadata": { "title": "Include Metadata", "description": "Whether or not to consider metadata when splitting.", "default": true, "type": "boolean" }, "include_prev_next_rel": { "title": "Include Prev Next Rel", "description": "Include prev/next node relationships.", "default": true, "type": "boolean" }, "callback_manager": { "title": "Callback Manager", "type": "object", "default": {} }, "llm": { "title": "Llm", "description": "LLM model to use for summarization.", "allOf": [ { "$ref": "#/definitions/LLM" } ] }, "summary_query_str": { "title": "Summary Query Str", "description": "Query string to use for summarization.", "default": "What is this table about? Give a very concise summary (imagine you are adding a new caption and summary for this table), and output the real/existing table title/caption if context provided.and output the real/existing table id if context provided.and also output whether or not the table should be kept.", "type": "string" }, "num_workers": { "title": "Num Workers", "description": "Num of workers for async jobs.", "default": 4, "type": "integer" }, "show_progress": { "title": "Show Progress", "description": "Whether to show progress.", "default": true, "type": "boolean" }, "nested_node_parser": { "title": "Nested Node Parser", "description": "Other types of node parsers to handle some types of nodes.", "allOf": [ { "$ref": "#/definitions/NodeParser" } ] }, "partitioning_parameters": { "title": "Partitioning Parameters", "description": "Extra dictionary representing parameters of the partitioning process.", "default": {}, "type": "object" }, "class_name": { "title": "Class Name", "type": "string", "default": "UnstructuredElementNodeParser" } }, "definitions": { "PydanticProgramMode": { "title": "PydanticProgramMode", "description": "Pydantic program mode.", "enum": [ "default", "openai", "llm", "guidance", "lm-format-enforcer" ], "type": "string" }, "BasePromptTemplate": { "title": "BasePromptTemplate", "description": "Chainable mixin.\n\nA module that can produce a `QueryComponent` from a set of inputs through\n`as_query_component`.\n\nIf plugged in directly into a `QueryPipeline`, the `ChainableMixin` will be\nconverted into a `QueryComponent` with default parameters.", "type": "object", "properties": { "metadata": { "title": "Metadata", "type": "object" }, "template_vars": { "title": "Template Vars", "type": "array", "items": { "type": "string" } }, "kwargs": { "title": "Kwargs", "type": "object", "additionalProperties": { "type": "string" } }, "output_parser": { "title": "Output Parser", "type": "object", "default": {} }, "template_var_mappings": { "title": "Template Var Mappings", "description": "Template variable mappings (Optional).", "type": "object" } }, "required": [ "metadata", "template_vars", "kwargs" ] }, "LLM": { "title": "LLM", "description": "LLM interface.", "type": "object", "properties": { "callback_manager": { "title": "Callback Manager", "type": "object", "default": {} }, "system_prompt": { "title": "System Prompt", "description": "System prompt for LLM calls.", "type": "string" }, "output_parser": { "title": "Output Parser", "description": "Output parser to parse, validate, and correct errors programmatically.", "type": "object", "default": {} }, "pydantic_program_mode": { "default": "default", "allOf": [ { "$ref": "#/definitions/PydanticProgramMode" } ] }, "query_wrapper_prompt": { "title": "Query Wrapper Prompt", "description": "Query wrapper prompt for LLM calls.", "allOf": [ { "$ref": "#/definitions/BasePromptTemplate" } ] }, "class_name": { "title": "Class Name", "type": "string", "default": "base_component" } } }, "NodeParser": { "title": "NodeParser", "description": "Base interface for node parser.", "type": "object", "properties": { "include_metadata": { "title": "Include Metadata", "description": "Whether or not to consider metadata when splitting.", "default": true, "type": "boolean" }, "include_prev_next_rel": { "title": "Include Prev Next Rel", "description": "Include prev/next node relationships.", "default": true, "type": "boolean" }, "callback_manager": { "title": "Callback Manager", "type": "object", "default": {} }, "class_name": { "title": "Class Name", "type": "string", "default": "base_component" } } } } }
- Config
arbitrary_types_allowed: bool = True
- Fields
partitioning_parameters (Optional[Dict[str, Any]])
- Validators
_validate_id_func
»id_func
- field partitioning_parameters: Optional[Dict[str, Any]] = {}#
Extra dictionary representing parameters of the partitioning process.
- classmethod class_name() str #
Get the class name, used as a unique ID in serialization.
This provides a key that makes serialization robust against actual class name changes.
- extract_elements(text: str, table_filters: Optional[List[Callable]] = None, **kwargs: Any) List[Element] #
Extract elements from text.
- filter_table(table_element: Any) bool #
Filter tables.