HTMLNodeParser#
- pydantic model llama_index.node_parser.HTMLNodeParser#
HTML node parser.
Splits a document into Nodes using custom HTML splitting logic.
- Parameters
include_metadata (bool) – whether to include metadata in nodes
include_prev_next_rel (bool) – whether to include prev/next relationships
Show JSON schema
{ "title": "HTMLNodeParser", "description": "HTML node parser.\n\nSplits a document into Nodes using custom HTML splitting logic.\n\nArgs:\n include_metadata (bool): whether to include metadata in nodes\n include_prev_next_rel (bool): whether to include prev/next relationships", "type": "object", "properties": { "include_metadata": { "title": "Include Metadata", "description": "Whether or not to consider metadata when splitting.", "default": true, "type": "boolean" }, "include_prev_next_rel": { "title": "Include Prev Next Rel", "description": "Include prev/next node relationships.", "default": true, "type": "boolean" }, "callback_manager": { "title": "Callback Manager" }, "id_func": { "title": "Id Func" }, "tags": { "title": "Tags", "description": "HTML tags to extract text from.", "default": [ "p", "h1", "h2", "h3", "h4", "h5", "h6", "li", "b", "i", "u", "section" ], "type": "array", "items": { "type": "string" } }, "class_name": { "title": "Class Name", "type": "string", "default": "HTMLNodeParser" } } }
- Config
arbitrary_types_allowed: bool = True
- Fields
tags (List[str])
- field tags: List[str] = ['p', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'li', 'b', 'i', 'u', 'section']#
HTML tags to extract text from.
- classmethod class_name() str #
Get class name.
- classmethod from_defaults(include_metadata: bool = True, include_prev_next_rel: bool = True, callback_manager: Optional[CallbackManager] = None, tags: Optional[List[str]] = ['p', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'li', 'b', 'i', 'u', 'section']) HTMLNodeParser #