Skip to main content
Learn how to scale Laddr agents horizontally and deploy to production environments.

Horizontal Scaling

Scale Workers

Scale agent workers to handle increased load:
# Scale a single agent type
laddr scale researcher 5

# Scale multiple agents
laddr scale researcher 5
laddr scale coordinator 3
laddr scale writer 2

Docker Compose Scaling

Scale workers using Docker Compose:
# Scale coordinator workers to 3 instances
docker compose up -d --scale coordinator_worker=3

# Scale all workers
docker compose up -d \\
  --scale coordinator_worker=3 \\
  --scale researcher_worker=3 \\
  --scale analyzer_worker=2 \\
  --scale writer_worker=2


Queue Backends

Redis (Development)

Fast, lightweight queue backend for development:
# .env
QUEUE_BACKEND=redis
REDIS_URL=redis://localhost:6379/0

Kafka (Production)

Durable, scalable queue backend for production:
# .env
QUEUE_BACKEND=kafka
KAFKA_BOOTSTRAP=kafka:9092

Kafka provides better message persistence and horizontal scaling capabilities for production workloads.

Memory (Testing)

In-memory queue for local testing:
# .env
QUEUE_BACKEND=memory

Memory backend only works within a single process. Use Redis or Kafka for multi-worker deployments.

Database Configuration

PostgreSQL (Production)

Use PostgreSQL for production deployments:
# .env
DB_BACKEND=postgresql
DATABASE_URL=postgresql://user:password@localhost:5432/laddr

SQLite (Development)

SQLite for local development:
# .env
DB_BACKEND=sqlite
DATABASE_URL=sqlite:///./laddr.db


Monitoring

Dashboard

Access the dashboard for real-time monitoring:
# Start dashboard
laddr run dev -d

# Access at http://localhost:5173

Metrics

Monitor key metrics:
  • Queue Depth - Number of pending tasks
  • Worker Utilization - Active workers vs idle
  • Throughput - Tasks processed per second
  • Error Rate - Failed tasks percentage
  • Latency - Average task completion time

Logs

View and follow logs:
# Follow logs for an agent
laddr logs researcher --follow

# Show last 100 lines
laddr logs researcher --tail 100

# View all service logs
docker compose logs -f


Production Deployment

Environment Variables

Configure production environment:
# .env.production
# Queue
QUEUE_BACKEND=kafka
KAFKA_BOOTSTRAP=kafka-cluster:9092

# Database
DB_BACKEND=postgresql
DATABASE_URL=postgresql://user:pass@db-host:5432/laddr

# Storage
STORAGE_BACKEND=s3
AWS_ACCESS_KEY_ID=...
AWS_SECRET_ACCESS_KEY=...
AWS_REGION=us-east-1

# LLM
LLM_PROVIDER=openai
OPENAI_API_KEY=...

Health Checks

Implement health checks:
# Check system health
laddr check

# API health endpoint
curl http://localhost:8000/api/health

Resource Limits

Set appropriate resource limits:
# docker-compose.yml
services:
  researcher_worker:
    deploy:
      resources:
        limits:
          cpus: '2'
          memory: 2G
        reservations:
          cpus: '1'
          memory: 1G


Load Balancing

Worker Distribution

Kafka automatically distributes tasks across workers: Each worker in a consumer group processes a subset of tasks.

Partition Strategy

Configure Kafka partitions for better parallelism:
# More partitions = more parallelism
# Create topic with 10 partitions
kafka-topics --create \\
  --bootstrap-server localhost:9092 \\
  --topic laddr.tasks.researcher \\
  --partitions 10 \\
  --replication-factor 1


Performance Tuning

Worker Configuration

Optimize worker settings:
# .env
MAX_CONCURRENT_TASKS=5  # Tasks per worker
WORKER_PREFETCH=10      # Prefetch count
WORKER_TIMEOUT=300      # Task timeout

Database Connection Pooling

Configure connection pooling:
# .env
DB_POOL_SIZE=20
DB_MAX_OVERFLOW=10


Troubleshooting

High Queue Depth

If queue depth is growing:
  1. Scale up workers: laddr scale researcher 10
  2. Check worker logs for errors
  3. Verify database/storage connectivity
  4. Check for slow tools or LLM calls

Worker Failures

If workers are failing:
  1. Check logs: laddr logs researcher --tail 100
  2. Verify API keys and credentials
  3. Check resource limits (CPU/memory)
  4. Review error messages in dashboard

Performance Issues

If performance is slow:
  1. Monitor dashboard metrics
  2. Check database query performance
  3. Review LLM response times
  4. Optimize tool implementations
  5. Consider caching strategies

Next Steps