XOrbits Xinference

pydantic model llama_index.llms.xinference.Xinference

Show JSON schema
   "title": "Xinference",
   "description": "Simple abstract base class for custom LLMs.\n\nSubclasses must implement the `__init__`, `complete`,\n    `stream_complete`, and `metadata` methods.",
   "type": "object",
   "properties": {
      "callback_manager": {
         "title": "Callback Manager"
      "model_uid": {
         "title": "Model Uid",
         "description": "The Xinference model to use.",
         "type": "string"
      "endpoint": {
         "title": "Endpoint",
         "description": "The Xinference endpoint URL to use.",
         "type": "string"
      "temperature": {
         "title": "Temperature",
         "description": "The temperature to use for sampling.",
         "gte": 0.0,
         "lte": 1.0,
         "type": "number"
      "max_tokens": {
         "title": "Max Tokens",
         "description": "The maximum new tokens to generate as answer.",
         "exclusiveMinimum": 0,
         "type": "integer"
      "context_window": {
         "title": "Context Window",
         "description": "The maximum number of context tokens for the model.",
         "exclusiveMinimum": 0,
         "type": "integer"
      "model_description": {
         "title": "Model Description",
         "description": "The model description from Xinference.",
         "type": "object"
      "class_name": {
         "title": "Class Name",
         "type": "string",
         "default": "Xinference_llm"
   "required": [

  • arbitrary_types_allowed: bool = True

  • _validate_callback_manager » callback_manager

field context_window: int [Required]

The maximum number of context tokens for the model.

  • exclusiveMinimum = 0

field endpoint: str [Required]

The Xinference endpoint URL to use.

field max_tokens: int [Required]

The maximum new tokens to generate as answer.

  • exclusiveMinimum = 0

field model_description: Dict[str, Any] [Required]

The model description from Xinference.

field model_uid: str [Required]

The Xinference model to use.

field temperature: float [Required]

The temperature to use for sampling.

chat(messages: Sequence[ChatMessage], **kwargs: Any) Any

Chat endpoint for LLM.

classmethod class_name() str

Get the class name, used as a unique ID in serialization.

This provides a key that makes serialization robust against actual class name changes.

complete(*args: Any, **kwargs: Any) Any

Completion endpoint for LLM.

load_model(model_uid: str, endpoint: str) Tuple[Any, int, dict]
stream_chat(messages: Sequence[ChatMessage], **kwargs: Any) Any

Streaming chat endpoint for LLM.

stream_complete(*args: Any, **kwargs: Any) Any

Streaming completion endpoint for LLM.

property metadata: LLMMetadata

LLM metadata.