Observability#

LlamaDeployment provides comprehensive observability capabilities through distributed tracing and metrics collection. This allows you to monitor workflow execution, track performance, and debug issues across your distributed deployment.

Overview#

LlamaDeployment supports two main observability features:

Distributed Tracing - Track request flows across services using OpenTelemetry
Metrics Collection - Monitor system performance with Prometheus metrics

Distributed Tracing#

Distributed tracing provides end-to-end visibility into workflow execution across all components in your deployment. Traces show the complete journey of a request from the API server through message queues to workflow completion.

What Gets Traced#

Control Plane: Service registration, task orchestration, and session management
Workflow Services: Complete workflow execution lifecycle including state loading, workflow running, event streaming, and result publishing
Message Queues: Message publishing and consumption with trace context propagation

Configuration#

Tracing is disabled by default and can be enabled through environment variables:

# Enable tracing
export LLAMA_DEPLOY_APISERVER_TRACING_ENABLED=true

# Set service name (default: llama-deploy-apiserver)
export LLAMA_DEPLOY_APISERVER_TRACING_SERVICE_NAME=my-api-server

# Configure exporter (console, jaeger, otlp)
export LLAMA_DEPLOY_APISERVER_TRACING_EXPORTER=jaeger
export LLAMA_DEPLOY_APISERVER_TRACING_ENDPOINT=localhost:14268

# Configure sampling rate (0.0 to 1.0)
export LLAMA_DEPLOY_APISERVER_TRACING_SAMPLE_RATE=0.1

Supported Exporters#

Console Exporter#

Prints traces to the console - useful for development:

export LLAMA_DEPLOY_APISERVER_TRACING_EXPORTER=console

OTLP Exporter#

Exports traces using OpenTelemetry Protocol (works with many backends like Jaeger):

export LLAMA_DEPLOY_APISERVER_TRACING_EXPORTER=otlp
export LLAMA_DEPLOY_APISERVER_TRACING_ENDPOINT=http://localhost:4317

Setting Up Jaeger#

To set up Jaeger for trace collection and visualization:

# Run Jaeger all-in-one container
docker run --rm -d \
  -e COLLECTOR_ZIPKIN_HOST_PORT=:9411 \
  -p 16686:16686 \
  -p 4317:4317 \
  -p 4318:4318 \
  -p 9411:9411 \
  jaegertracing/all-in-one:latest

# Configure LlamaDeployment to use Jaeger
export LLAMA_DEPLOY_APISERVER_TRACING_ENABLED=true
export LLAMA_DEPLOY_APISERVER_TRACING_EXPORTER=otlp
export LLAMA_DEPLOY_APISERVER_TRACING_ENDPOINT=http://localhost:4317

Access the Jaeger UI at http://localhost:16686 to view traces.

Trace Context Propagation#

Traces automatically propagate across service boundaries through message queues. Each message includes trace context (trace_id and span_id) in the QueueMessageStats, ensuring complete end-to-end tracing.

Metrics Collection#

LlamaDeployment includes Prometheus metrics for monitoring system performance and health.

API Server Metrics#

The API server automatically exposes Prometheus metrics when enabled:

# Enable Prometheus metrics (default: true)
export LLAMA_DEPLOY_APISERVER_PROMETHEUS_ENABLED=true

# Set metrics port (default: 9000)
export LLAMA_DEPLOY_APISERVER_PROMETHEUS_PORT=9000

Metrics are available at http://localhost:9000/metrics.

Available Metrics#

The API server tracks several key metrics:

Deployment State: Current state of deployments (running, stopped, etc.)
Service State: Health and status of registered services
API Request Metrics: HTTP request counts, durations, and error rates (via tracing integration)

Setting Up Prometheus#

Create a prometheus.yml configuration file:

global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'llama-deploy-apiserver'
    static_configs:
      - targets: ['localhost:9000']
    scrape_interval: 5s

Run Prometheus:

# Using Docker
docker run -d --name prometheus \
  -p 9090:9090 \
  -v $(pwd)/prometheus.yml:/etc/prometheus/prometheus.yml \
  prom/prometheus

# Using binary
prometheus --config.file=prometheus.yml

Access Prometheus at http://localhost:9090.

Setting Up Grafana#

For advanced visualization, set up Grafana with Prometheus as a data source:

# Run Grafana
docker run -d --name grafana \
  -p 3000:3000 \
  grafana/grafana

Open http://localhost:3000 (admin/admin)
Add Prometheus as data source: http://localhost:9090
Create dashboards to visualize LlamaDeployment metrics

Best Practices#

Sampling#

For production deployments, configure appropriate sampling rates to balance observability with performance:

# Sample 10% of traces
export LLAMA_DEPLOY_APISERVER_TRACING_SAMPLE_RATE=0.1

Service Names#

Use descriptive service names to distinguish between different deployments:

export LLAMA_DEPLOY_APISERVER_TRACING_SERVICE_NAME=prod-rag-workflow
export LLAMA_DEPLOY_APISERVER_TRACING_SERVICE_NAME=prod-api-server

Resource Attributes#

Add custom resource attributes for better filtering:

from llama_deploy.apiserver.tracing.utils import add_span_attribute

# Add custom attributes in your workflow
add_span_attribute("workflow.type", "rag")
add_span_attribute("environment", "production")

Monitoring Alerts#

Set up alerts based on metrics:

# Prometheus alerting rule example
- alert: DeploymentDown
  expr: llama_deploy_deployment_state{state="stopped"} > 0
  for: 5m
  labels:
    severity: critical
  annotations:
    summary: "LlamaDeployment deployment is down"

Troubleshooting#

Traces Not Appearing#

Verify tracing is enabled: LLAMA_DEPLOY_APISERVER_TRACING_ENABLED=true
Check exporter configuration and endpoint connectivity
Verify sampling rate is not too low
Check application logs for tracing errors

Missing Dependencies#

Install tracing dependencies:

pip install llama-deploy[observability]

Performance Impact#

Tracing adds minimal overhead when properly configured
Use sampling to reduce overhead in high-traffic scenarios
Console exporter has higher overhead than Jaeger/OTLP

Metrics Not Available#

Verify Prometheus is enabled: LLAMA_DEPLOY_APISERVER_PROMETHEUS_ENABLED=true
Check metrics port is accessible: curl http://localhost:9000/metrics
Verify Prometheus configuration and targets

Security Considerations#

Sensitive Data#

Tracing doesn't automatically exclude sensitive parameters from span attributes. Review custom span attributes to ensure no sensitive data is included.

Network Security#

When using OTLP in production:

# Use secure connections
export LLAMA_DEPLOY_APISERVER_TRACING_INSECURE=false
export LLAMA_DEPLOY_APISERVER_TRACING_ENDPOINT=https://your-secure-endpoint.com