SimpleWebPageReader#
- pydantic model llama_index.readers.SimpleWebPageReader#
Simple web page reader.
Reads pages from the web.
- Parameters
html_to_text (bool) – Whether to convert HTML to text. Requires html2text package.
metadata_fn (Optional[Callable[[str], Dict]]) – A function that takes in a URL and returns a dictionary of metadata. Default is None.
Show JSON schema
{ "title": "SimpleWebPageReader", "description": "Simple web page reader.\n\nReads pages from the web.\n\nArgs:\n html_to_text (bool): Whether to convert HTML to text.\n Requires `html2text` package.\n metadata_fn (Optional[Callable[[str], Dict]]): A function that takes in\n a URL and returns a dictionary of metadata.\n Default is None.", "type": "object", "properties": { "is_remote": { "title": "Is Remote", "default": true, "type": "boolean" }, "html_to_text": { "title": "Html To Text", "type": "boolean" }, "class_name": { "title": "Class Name", "type": "string", "default": "SimpleWebPageReader" } }, "required": [ "html_to_text" ] }
- Config
arbitrary_types_allowed: bool = True
- Fields
html_to_text (bool)
is_remote (bool)
- field html_to_text: bool [Required]#
- field is_remote: bool = True#
- classmethod class_name() str #
Get the class name, used as a unique ID in serialization.
This provides a key that makes serialization robust against actual class name changes.