Scaling & Operations

Learn how to scale Laddr agents horizontally and deploy to production environments.

Horizontal Scaling

Scale Workers

Scale agent workers to handle increased load:

# Scale a single agent type
laddr scale researcher 5

# Scale multiple agents
laddr scale researcher 5
laddr scale coordinator 3
laddr scale writer 2

Docker Compose Scaling

Scale workers using Docker Compose:

# Scale coordinator workers to 3 instances
docker compose up -d --scale coordinator_worker=3

# Scale all workers
docker compose up -d \\
  --scale coordinator_worker=3 \\
  --scale researcher_worker=3 \\
  --scale analyzer_worker=2 \\
  --scale writer_worker=2

Queue Backends

Redis (Development)

Fast, lightweight queue backend for development:

# .env
QUEUE_BACKEND=redis
REDIS_URL=redis://localhost:6379/0

Kafka (Production)

Durable, scalable queue backend for production:

# .env
QUEUE_BACKEND=kafka
KAFKA_BOOTSTRAP=kafka:9092

Kafka provides better message persistence and horizontal scaling capabilities for production workloads.

Memory (Testing)

In-memory queue for local testing:

# .env
QUEUE_BACKEND=memory

Memory backend only works within a single process. Use Redis or Kafka for multi-worker deployments.

Database Configuration

PostgreSQL (Production)

Use PostgreSQL for production deployments:

# .env
DB_BACKEND=postgresql
DATABASE_URL=postgresql://user:password@localhost:5432/laddr

SQLite (Development)

SQLite for local development:

# .env
DB_BACKEND=sqlite
DATABASE_URL=sqlite:///./laddr.db

Monitoring

Dashboard

Access the dashboard for real-time monitoring:

# Start dashboard
laddr run dev -d

# Access at http://localhost:5173

Metrics

Monitor key metrics:

Queue Depth - Number of pending tasks
Worker Utilization - Active workers vs idle
Throughput - Tasks processed per second
Error Rate - Failed tasks percentage
Latency - Average task completion time

Logs

View and follow logs:

# Follow logs for an agent
laddr logs researcher --follow

# Show last 100 lines
laddr logs researcher --tail 100

# View all service logs
docker compose logs -f

Production Deployment

Environment Variables

Configure production environment:

# .env.production
# Queue
QUEUE_BACKEND=kafka
KAFKA_BOOTSTRAP=kafka-cluster:9092

# Database
DB_BACKEND=postgresql
DATABASE_URL=postgresql://user:pass@db-host:5432/laddr

# Storage
STORAGE_BACKEND=s3
AWS_ACCESS_KEY_ID=...
AWS_SECRET_ACCESS_KEY=...
AWS_REGION=us-east-1

# LLM
LLM_PROVIDER=openai
OPENAI_API_KEY=...

Health Checks

Implement health checks:

# Check system health
laddr check

# API health endpoint
curl http://localhost:8000/api/health

Resource Limits

Set appropriate resource limits:

# docker-compose.yml
services:
  researcher_worker:
    deploy:
      resources:
        limits:
          cpus: '2'
          memory: 2G
        reservations:
          cpus: '1'
          memory: 1G

Load Balancing

Worker Distribution

Kafka automatically distributes tasks across workers: Each worker in a consumer group processes a subset of tasks.

Partition Strategy

Configure Kafka partitions for better parallelism:

# More partitions = more parallelism
# Create topic with 10 partitions
kafka-topics --create \\
  --bootstrap-server localhost:9092 \\
  --topic laddr.tasks.researcher \\
  --partitions 10 \\
  --replication-factor 1

Performance Tuning

Worker Configuration

Optimize worker settings:

# .env
MAX_CONCURRENT_TASKS=5  # Tasks per worker
WORKER_PREFETCH=10      # Prefetch count
WORKER_TIMEOUT=300      # Task timeout

Database Connection Pooling

Configure connection pooling:

# .env
DB_POOL_SIZE=20
DB_MAX_OVERFLOW=10

Troubleshooting

High Queue Depth

If queue depth is growing:

Scale up workers: laddr scale researcher 10
Check worker logs for errors
Verify database/storage connectivity
Check for slow tools or LLM calls

Worker Failures

If workers are failing:

Check logs: laddr logs researcher --tail 100
Verify API keys and credentials
Check resource limits (CPU/memory)
Review error messages in dashboard

Performance Issues

If performance is slow:

Monitor dashboard metrics
Check database query performance
Review LLM response times
Optimize tool implementations
Consider caching strategies

Next Steps

Local Runtime - Local development
Storage & Artifacts - Configure storage
Agent Configuration - Configure agents

Introduction

Getting Started

Guides

Reference

Examples

Community

Horizontal Scaling

Scale Workers

Docker Compose Scaling

Queue Backends

Redis (Development)

Kafka (Production)

Memory (Testing)

Database Configuration

PostgreSQL (Production)

SQLite (Development)

Monitoring

Dashboard

Metrics

Logs

Production Deployment

Environment Variables

Health Checks

Resource Limits

Load Balancing

Worker Distribution

Partition Strategy

Performance Tuning

Worker Configuration

Database Connection Pooling

Troubleshooting

High Queue Depth

Worker Failures

Performance Issues

Next Steps

Introduction

Getting Started

Guides

Reference

Examples

Community

​Horizontal Scaling

​Scale Workers

​Docker Compose Scaling

​Queue Backends

​Redis (Development)

​Kafka (Production)

​Memory (Testing)

​Database Configuration

​PostgreSQL (Production)

​SQLite (Development)

​Monitoring

​Dashboard

​Metrics

​Logs

​Production Deployment

​Environment Variables

​Health Checks

​Resource Limits

​Load Balancing

​Worker Distribution

​Partition Strategy

​Performance Tuning

​Worker Configuration

​Database Connection Pooling

​Troubleshooting

​High Queue Depth

​Worker Failures

​Performance Issues

​Next Steps

Horizontal Scaling

Scale Workers

Docker Compose Scaling

Queue Backends

Redis (Development)

Kafka (Production)

Memory (Testing)

Database Configuration

PostgreSQL (Production)

SQLite (Development)

Monitoring

Dashboard

Metrics

Logs

Production Deployment

Environment Variables

Health Checks

Resource Limits

Load Balancing

Worker Distribution

Partition Strategy

Performance Tuning

Worker Configuration

Database Connection Pooling

Troubleshooting

High Queue Depth

Worker Failures

Performance Issues

Next Steps