agentbreeder

Model Gateway

How AgentBreeder uses LiteLLM as its AI gateway — provider routing, virtual keys, cost tracking, and guardrails.

Model Gateway

AgentBreeder uses LiteLLM as its model gateway — a self-hosted proxy that sits between your agents and every LLM provider. When you set model.gateway: litellm in your agent.yaml, all inference calls route through the gateway instead of calling providers directly.

model:
  primary: claude-sonnet-4
  fallback: gpt-4o
  gateway: litellm        # route through the proxy
  temperature: 0.7

The gateway is optional. Omit model.gateway to call providers directly. The gateway adds ~12ms overhead and enables cost tracking, guardrails, caching, and team budget enforcement.


Why Use the Gateway

Without the gateway, each deployed agent holds its own API keys and calls providers directly. With the gateway:

Without gatewayWith gateway
API keys in each containerOne master key; agents get scoped virtual keys
No cost visibilityLive spend per agent, per team, per model
No guardrailsPII detection + prompt injection blocking on every call
No cachingRepeated prompts return cached responses
Manual fallbacksAutomatic provider failover on errors

How It Works

agent.yaml (model.gateway: litellm)


AgentBreeder engine
  ├── RBAC check
  ├── Mints per-agent virtual key (sk-agent-<name>)
  └── Injects LITELLM_API_KEY + LITELLM_BASE_URL into container


LiteLLM proxy (:4000)
  ├── Validates virtual key
  ├── Enforces team budget
  ├── Runs PII guardrail
  ├── Checks Redis cache
  ├── Routes to provider (with retries + fallback)
  └── Logs OTEL span → AgentBreeder tracing


Provider (Anthropic / OpenAI / Google / Ollama / ...)

AgentBreeder owns governance (RBAC, audit, team budgets). LiteLLM handles routing (fallbacks, retries, caching, provider translation). Neither owns the other's domain.


Supported Providers

The gateway can route to any LiteLLM-supported provider. Out of the box, the quickstart config includes:

Model aliasProviderNotes
gpt-4oOpenAIRequires OPENAI_API_KEY
gpt-4o-miniOpenAI
claude-sonnet-4AnthropicRequires ANTHROPIC_API_KEY
claude-haiku-4Anthropic
gemini-2.0-flashGoogleRequires GOOGLE_API_KEY
openrouter/autoOpenRouter300+ models via OPENROUTER_API_KEY
ollama/llama3.2Ollama (local)Requires Ollama running locally

Add more models by editing deploy/litellm_config.yaml:

model_list:
  - model_name: my-custom-alias
    litellm_params:
      model: anthropic/claude-opus-4-5
      api_key: os.environ/ANTHROPIC_API_KEY

Virtual Keys

Every agent gets a scoped virtual key (sk-agent-<name>) automatically minted at deploy time. The key:

  • Is injected into the deployed container as LITELLM_API_KEY
  • Is scoped to the agent's allowed models (from agent.yaml)
  • Is attributed to the agent's team for cost tracking
  • Can be revoked from the dashboard without redeploying the agent

View and manage keys from the dashboard under Settings → API Keys, or via the CLI:

agentbreeder describe my-agent --keys

Cost Tracking

Every LLM call through the gateway is tracked and attributed to the calling agent and its team. View spend in the dashboard under Costs, or via the API:

# Spend for a specific team
GET /api/v1/costs?team=engineering

# Spend by model
GET /api/v1/costs?group_by=model

Set team budgets when creating a team:

agentbreeder team create engineering --budget 500 --budget-period 30d

The gateway sends alerts at 85% and 95% of the budget before enforcing the hard limit.


Guardrails

Two guardrails are enabled by default when the gateway is active:

Presidio PII detection — scans every request for PII (names, emails, credit cards, SSNs) before the call reaches the LLM. Redacts or blocks based on configuration.

Lakera prompt injection — detects attempts to hijack agent behavior through crafted user inputs.

Configure guardrail behavior in deploy/litellm_config.yaml:

guardrails:
  - guardrail_name: presidio-pii
    litellm_params:
      guardrail: presidio
      mode: pre_call          # scan before the LLM call
      output_parse_pii: true  # also scan LLM output

To disable guardrails for a specific agent (not recommended for production):

# agent.yaml
guardrails: []

Caching

The gateway caches LLM responses in Redis. Identical prompts return the cached response without making a provider call — reducing latency and cost.

# deploy/litellm_config.yaml
litellm_settings:
  cache: true
  cache_params:
    type: redis
    host: redis
    port: 6379
    ttl: 600   # 10 minutes

Per-request cache control (pass in the request body from your agent):

# Force fresh call, don't use cache
response = client.chat.completions.create(
    model="claude-sonnet-4",
    messages=[...],
    extra_body={"cache": {"no-cache": True}}
)

Fallbacks and Retries

Set a fallback model in agent.yaml and the gateway handles automatic failover:

model:
  primary: claude-sonnet-4
  fallback: gpt-4o
  gateway: litellm

If the primary model returns a rate limit error or is unavailable, the gateway automatically retries with the fallback. No code change in your agent — the same LITELLM_BASE_URL endpoint works regardless of which provider responds.

For more complex routing (load balancing across multiple deployments of the same model), configure routing strategies directly in litellm_config.yaml:

router_settings:
  routing_strategy: latency-based-routing  # route to fastest deployment

Running the Gateway Locally

The gateway is included in the default Docker Compose stack:

docker compose up -d   # starts postgres, redis, API, dashboard, and litellm

The LiteLLM admin UI is available at http://localhost:4000/ui. The default master key is sk-agentbreeder-quickstart (set LITELLM_MASTER_KEY in .env for production).

To verify the gateway is healthy:

curl http://localhost:4000/health

Observability

Every gateway call emits an OpenTelemetry span with:

  • Model used, provider, latency
  • Token counts (input / output / total)
  • Cost in USD
  • Virtual key alias (agent attribution)
  • x-litellm-call-id — correlated to AgentBreeder audit log entries

View traces in the dashboard under Tracing, or connect your own OTEL collector by updating the endpoint in litellm_config.yaml.


Gateway Dashboard

The Gateway page in the AgentBreeder dashboard shows:

  • Live status of each configured provider
  • Model catalog with pricing
  • Request log with latency, token counts, and cost per call
  • Cost comparison table across providers

Skipping the Gateway

If you need to bypass the gateway for a specific agent (e.g., a local dev agent that calls Ollama directly):

model:
  primary: ollama/llama3.2
  # no gateway field — calls Ollama directly

Direct calls skip virtual key minting, budget enforcement, guardrails, and caching. Use only for local development.

On this page