Deployment

Ship a registered agent to AWS, GCP, or Azure — covers the user-input contract per cloud (BYO existing infra), plus two verified end-to-end references: GCP Cloud Run and AWS ECS Fargate (with auto-provisioned RDS pgvector + ElastiCache Redis).

Once your agent runs locally and is registered (prompts + tools + agent record), the deploy step packages the agent + the AgentBreeder runtime into a container, pushes it to your cloud's image registry, and rolls it out as a managed service with secrets and auth wired in.

AgentBreeder supports three clouds — AWS, GCP, and Azure — plus local Docker Compose for development. This page covers two verified end-to-end reference deploys — GCP Cloud Run and AWS ECS Fargate (with auto-provisioned RDS pgvector + ElastiCache) — and the same pattern applies to AWS App Runner and Azure Container Apps with the per-cloud user-input contract below.

Reference deploy

The microlearning-ebook-agent example is deployed and serving at https://microlearning-ebook-agent-sizukgalta-uc.a.run.app. Every snippet on this page was generated from that deploy.

Greenfield vs. existing account — which surface does what

AgentBreeder supports two starting points for every cloud:

Existing account / BYO infra — you already have a VPC, subnets, an ECS cluster, and an execution role. Both the CLI (agentbreeder deploy) and Studio deploy into them. This is the path the two verified walkthroughs on this page use.
Greenfield — a fresh account with no networking. AgentBreeder provisions the whole footprint for you. On all three clouds this ships from the CLI (agentbreeder deploy --target <cloud> --provision) and the Studio deploy wizard (Step 3 → Provision for me). AWS builds the VPC, subnets, NAT, security groups, ECS cluster, IAM, and — when declared — RDS; GCP builds a VPC + subnet + Cloud NAT + a Private Service Access range (for Cloud SQL private IP) + Artifact Registry + service account; Azure builds the resource group, Log Analytics, Container Apps environment, ACR, and managed identity.

Independently of which path you pick, the CLI and Studio both auto-provision the data tier (managed Postgres pgvector / managed Redis) when an agent declares memory: or a knowledge base without a backend_url. For the at-a-glance matrix, see the deploy-target status table in the CLI reference; for actionable per-target prerequisites, see Prerequisites per target below.

User-input contract per cloud (BYO existing infra)

For each cloud, AgentBreeder needs a minimum set of inputs from you so it can deploy into your existing infrastructure. Two modes are supported:

simple — account + region + credentials. AgentBreeder uses cloud defaults for the rest (cluster names, IAM roles, networking).
full — describe every specific resource (cluster, subnets, IAM role, registry).

Both modes are served by two API endpoints:

Endpoint	Purpose
`GET /api/v1/deployments/cloud-requirements/{cloud}?mode=simple\|full`	Returns the required + optional fields for the cloud + mode. The Studio wizard and the CLI both read this to know what to ask you for.
`POST /api/v1/deployments/validate-infra`	Read-only pre-flight check that all referenced resources actually exist. Rate-limited (10 req/min) and audit-logged. Cross-team access returns 403.

Auth posture

cloud-requirements requires any authenticated user. validate-infra requires the caller to hold the deployer role in the team_id named in the request body — cross-team validation is denied. Every successful or failed call writes to the audit log (AuditService.log_event(action="deployment.validate_infra", ...)).

Simple mode (mode=simple):

Field	Required	Description
`AWS_ACCOUNT_ID`	✓	AWS Account ID (12 digits)
`AWS_DEFAULT_REGION`	✓	AWS region (default `us-east-1`)
`AWS_ACCESS_KEY_ID` + `AWS_SECRET_ACCESS_KEY`	✓	Credentials (or use `AWS_PROFILE`)
`AWS_SESSION_TOKEN`	—	STS session token (for assumed roles)
`AWS_PROFILE`	—	Named profile from `~/.aws/credentials`
`AWS_ECR_REGISTRY`	—	Override ECR registry host

In simple mode AgentBreeder uses ECS cluster agentbreeder-default, IAM role agentbreeder-ecs-execution-role, the default VPC, and the default security group. Those defaults must already exist for a CLI deploy. To have AgentBreeder create them for you on a fresh account, use Greenfield mode (below) via the Studio wizard; the equivalent CLI auto-create is on the roadmap.

Full mode adds:

Field	Required	Description
`AWS_ECS_CLUSTER`	✓	ECS cluster name
`AWS_EXECUTION_ROLE_ARN`	✓	IAM execution role ARN
`AWS_VPC_SUBNETS`	✓	Comma-separated subnet IDs (≥2 for HA)
`AWS_SECURITY_GROUPS`	✓	Comma-separated security group IDs
`AWS_ECR_REPOSITORY`	✓	ECR repository name
`AWS_TASK_ROLE_ARN`	—	Task IAM role ARN
`AWS_ALB_TARGET_GROUP_ARN`	—	ALB target group ARN
`AWS_CLOUDWATCH_LOG_GROUP`	—	CloudWatch Logs group
`AWS_SECRETS_MANAGER_PREFIX`	—	ARN prefix for Secrets Manager

What validate-infra checks (read-only): sts:GetCallerIdentity (creds + account match), ec2:DescribeSubnets, ec2:DescribeSecurityGroups, ecs:DescribeClusters, iam:GetRole, ecr:DescribeRepositories.

Greenfield mode — agentbreeder deploy --target ecs-fargate --provision:

For a fresh AWS account with no networking, AgentBreeder creates the minimum-viable ECS Fargate footprint end-to-end, then deploys the agent into it. This ships from both the CLI and the Studio wizard:

# Fresh AWS account: just creds + region in agent.yaml — no BYO env_vars.
agentbreeder deploy agent.yaml --target ecs-fargate --provision --local

The CLI provisions the VPC, subnets, NAT, security groups, ECS cluster, and IAM execution role (opening the agent's port for the task's public IP since there's no ALB in front), maps the resulting IDs into the deploy, and then runs the normal build → deploy → health-check → register pipeline. Declaring memory: or a knowledge base also auto-provisions the data tier (RDS pgvector / ElastiCache Redis) into the new VPC. Everything is tagged AgentBreeder=true and recorded in .agentbreeder/infra-state.json, so agentbreeder teardown <agent> removes the whole footprint with no orphans. Re-running --provision reuses the recorded footprint instead of creating a duplicate.

Only AWS_REGION (or deploy.region) and a credential chain are required — no pre-existing cluster, subnets, or roles.

Studio equivalent + GCP/Azure

The Studio deploy wizard (Step 3 → Provision for me) runs the same provisioner path with live SSE progress and partial-rollback; see Deploying from Studio. CLI --provision ships for AWS, GCP, and Azure — e.g. agentbreeder deploy --target cloud-run --provision --local (GCP) or --target container-apps --provision --local (Azure). --provision is local mode only for now (use --local).

Resource	Notes
VPC `10.0.0.0/16` + 2 public + 2 private subnets across 2 AZs	Tagged `AgentBreeder=true`
Internet Gateway + NAT Gateway (single AZ default, multi-AZ via `AWS_MULTI_AZ_NAT=1`)	Single NAT saves ~$32/mo
Security groups: `agentbreeder-alb-sg` (80/443), `agentbreeder-agent-sg` (8080 from ALB only), `agentbreeder-db-sg` (5432 from agent only)	DB SG never gets 0.0.0.0/0
ECS cluster `agentbreeder-{agent_name}` (FARGATE + FARGATE_SPOT)
IAM execution role `agentbreeder-execution-{agent_name}`	`AmazonECSTaskExecutionRolePolicy` + ECR pull
ECR repository	Per-agent, encrypted at rest
RDS PostgreSQL `t3.micro` (only when `memory:` declared)	`publicly_accessible=False`, `storage_encrypted=True`, password in Secrets Manager
ALB + target group + listener (only when `access.visibility: public`)	TLS 1.2+ policy `ELBSecurityPolicy-TLS13-1-2-2021-06`

State is written to .agentbreeder/infra-state.json; agentbreeder teardown --destroy-infra reverses in safe order. destroy() refuses to touch any resource missing the AgentBreeder=true tag, and takes a final RDS snapshot unless --no-final-snapshot is passed.

Simple mode (mode=simple):

Field	Required	Description
`GOOGLE_CLOUD_PROJECT`	✓	GCP project ID
`GOOGLE_APPLICATION_CREDENTIALS`	✓	Path to service-account JSON (or use ADC)
`GCP_REGION`	—	Cloud Run region (default `us-central1`)

In simple mode AgentBreeder uses Artifact Registry repo agentbreeder and the default compute service account <project-number>-compute@developer.gserviceaccount.com. You still need the four IAM roles documented in One-time IAM grants below.

Full mode adds:

Field	Required	Description
`GCP_ARTIFACT_REGISTRY_REPO`	✓	Artifact Registry repository name
`GCP_CLOUD_RUN_SERVICE_ACCOUNT`	✓	Service account email for the Cloud Run service
`GCP_VPC_CONNECTOR`	—	Serverless VPC Access connector
`GCP_CLOUD_SQL_INSTANCE`	—	Cloud SQL instance connection name
`GCP_ALLOW_UNAUTHENTICATED`	—	`true`/`false`
`GCP_CUSTOM_DOMAIN`	—	Custom domain mapping

What validate-infra checks (read-only): resourcemanager:GetProject, artifactregistry:GetRepository, iam:GetServiceAccount.

Greenfield mode (Studio → "Provision for me"):

For fresh GCP projects, AgentBreeder can create infrastructure on your behalf in place of the "Full mode" inputs above. This ships from the CLI (agentbreeder deploy --target cloud-run --provision --local), the Studio deploy wizard, and the deployments job API. Only GOOGLE_CLOUD_PROJECT and GCP_REGION are required; everything else is opt-in via environment flags. The CLI --provision path turns on the VPC + connector flags below automatically so the agent and data tier share a private network.

Flag	Effect
default	Artifact Registry repo `agentbreeder` + per-agent Service Account `ab-<agent-name>` + 4 IAM roles
`GCP_PROVISION_VPC=1`	Greenfield-creates a custom-mode VPC `ab-<agent-name>-vpc` + regional subnet (private Google access) + Cloud NAT (egress) + a Private Service Access range (for Cloud SQL private IP). The connector and Cloud SQL then target this network instead of `default`. Set automatically by CLI `--provision`.
`GCP_PROVISION_VPC_CONNECTOR=1`	Adds a Serverless VPC Access connector `ab-<agent-name>` (e2-micro, 2–3 instances) on the provisioned VPC (or `default`). Wire it into Cloud Run by setting `GCP_VPC_CONNECTOR` to the connector name from `.agentbreeder/infra-state.json`.
`GCP_PROVISION_CLOUD_SQL=1`	Adds a private-IP Cloud SQL Postgres 15 instance `{agent_name}-memory` (default `db-f1-micro`), database `agentbreeder_memory`, user `agentbreeder`. Random password is written to Secret Manager — never to disk. Implies the VPC connector unless `GCP_CLOUD_SQL_PRIVATE_IP=0`.

VPC connector tuning (all optional):

Variable	Default	Description
`GCP_VPC_NAME`	`default`	VPC network to attach the connector to
`GCP_VPC_CONNECTOR_IP_CIDR`	`10.8.0.0/28`	`/28` range for the connector (must not overlap existing subnets)
`GCP_VPC_CONNECTOR_MIN_INSTANCES`	`2`	Minimum throughput instances
`GCP_VPC_CONNECTOR_MAX_INSTANCES`	`3`	Maximum throughput instances
`GCP_VPC_CONNECTOR_MACHINE_TYPE`	`e2-micro`	Per-instance machine type

Cloud SQL tuning (all optional, only relevant when GCP_PROVISION_CLOUD_SQL=1):

Variable	Default	Description
`GCP_CLOUD_SQL_TIER`	`db-f1-micro`	Instance tier (`db-g1-small` recommended for prod)
`GCP_CLOUD_SQL_DATABASE`	`agentbreeder_memory`	Default database name
`GCP_CLOUD_SQL_USER`	`agentbreeder`	Default user name
`GCP_CLOUD_SQL_PRIVATE_IP`	`1`	Set `0` to disable the implicit VPC connector (not recommended)

Operator-readable outputs (in infra-state.json under resources.cloud_sql): connection_name (e.g. my-proj:us-central1:my-agent-memory) for Cloud Run wiring, and password_secret — the Secret Manager resource where the random DB password lives. Grant your Cloud Run service account roles/secretmanager.secretAccessor on that secret to read it at runtime.

State is written to .agentbreeder/infra-state.json; agentbreeder teardown --destroy-infra reverses everything in safe order (Cloud SQL → connector → service account → Artifact Registry repo).

Simple mode (mode=simple):

Field	Required	Description
`AZURE_SUBSCRIPTION_ID`	✓	Azure subscription ID
`AZURE_TENANT_ID`	✓	Microsoft Entra (Azure AD) tenant ID
`AZURE_CLIENT_ID` + `AZURE_CLIENT_SECRET`	✓	Service-principal credentials (or `az login`)
`AZURE_LOCATION`	—	Azure region (default `eastus`)

In simple mode AgentBreeder uses Resource Group agentbreeder-rg, auto-creates an ACR (agentbreeder<5-char-hash>), a Log Analytics workspace, and an ACA managed environment on first deploy (coming with #384).

Full mode adds:

Field	Required	Description
`AZURE_RESOURCE_GROUP`	✓	Resource Group name
`AZURE_ACR_LOGIN_SERVER`	✓	ACR login server (e.g. `myacr.azurecr.io`)
`AZURE_ACA_ENVIRONMENT`	✓	Container Apps managed environment name
`AZURE_LOG_ANALYTICS_WORKSPACE_ID`	✓	Log Analytics workspace ID (GUID)
`AZURE_MANAGED_IDENTITY_ID`	✓	User-assigned managed identity ID
`AZURE_KEY_VAULT_NAME`	—	Key Vault name for secrets
`AZURE_POSTGRES_FQDN`	—	Azure Database for PostgreSQL FQDN
`AZURE_VNET_SUBNET_ID`	—	VNet subnet ID for delegation

What validate-infra checks (read-only): SubscriptionClient.subscriptions.get, ResourceGroupsClient.get, ContainerRegistryManagementClient.registries.get, ContainerAppsAPIClient.managed_environments.get.

Greenfield mode (Studio → "Provision for me"):

For fresh Azure subscriptions, AgentBreeder creates the minimum-viable Container Apps footprint end-to-end. Like AWS and GCP, this runs through the Studio deploy wizard and the deployments job API today; CLI parity (deploy --target azure --provision) is on the roadmap. Only AZURE_SUBSCRIPTION_ID and a credential chain (service-principal env vars or az login) are required.

Resource	Notes
Resource Group `agentbreeder-{agent}-rg`	All other resources nest inside
Log Analytics workspace	Required prerequisite for Container Apps
Container Apps Environment	Internal-only unless `access.visibility: public`
Azure Container Registry `agentbreeder{suffix}`	Basic SKU, `admin_user_enabled=False`
Per-agent user-assigned Managed Identity	Holds `AcrPull` on the specific registry resource — never the subscription
VNet + delegated subnet (only when `memory:` declared)	For private PostgreSQL access
PostgreSQL Flexible Server `agentbreeder-{agent}-db` (only when `memory:` declared)	B1ms, `public_network_access=Disabled`, private DNS
Key Vault `agentbreeder-{agent}-kv` (only when DB is created)	Stores random DB password; state file references the secret URI only

Every resource tagged AgentBreeder=true + AgentName + Version. State is written to .agentbreeder/infra-state.json; agentbreeder teardown --destroy-infra refuses untagged resources and deletes the resource group last so anything not explicitly tracked is still cleaned up.

Calling `validate-infra` from your terminal

TOKEN=$AGENTBREEDER_API_TOKEN

curl -X POST http://localhost:8000/api/v1/deployments/validate-infra \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "team_id": "engineering",
    "cloud": "aws",
    "region": "us-east-1",
    "mode": "simple",
    "fields": {
      "AWS_ACCOUNT_ID": "123456789012",
      "AWS_ACCESS_KEY_ID": "AKIA...",
      "AWS_SECRET_ACCESS_KEY": "..."
    }
  }'
# {"data":{"valid":true,"cloud":"aws","region":"us-east-1","checks":[...]},"errors":[]}

If a referenced resource is missing or you lack permission, the response carries a populated errors array and the individual check status (missing / forbidden / error) for each resource.

What the deploy script does

microlearning-ebook-agent/scripts/deploy_gcp.sh automates five stages:

Stage	Command	Time
1. Enable APIs	`gcloud services enable run + artifactregistry + cloudbuild + secretmanager`	~5 sec (first run only)
2. Ensure Artifact Registry repo	`gcloud artifacts repositories create` (idempotent)	<1 sec
3. Push secrets to Secret Manager	`gcloud secrets create / versions add`	<1 sec
4. Build + push image	`gcloud builds submit . --config <generated cloudbuild.yaml>`	~3 min (first build)
5. Deploy to Cloud Run	`gcloud run deploy --update-secrets ...`	~1 min

Total: ~5 minutes from scratch.

Prerequisites

# 1. Authenticate
gcloud auth login
gcloud auth application-default login

# 2. Set the target project
gcloud config set project <PROJECT_ID>

# 3. .env in your agent project with the runtime secrets
cat > microlearning-ebook-agent/.env <<EOF
GOOGLE_API_KEY=AIza...
TAVILY_API_KEY=tvly-...
AGENT_AUTH_TOKEN=$(openssl rand -hex 16)
EOF

One-time IAM grants

Cloud Build and Cloud Run both default to the same service account (<project-number>-compute@developer.gserviceaccount.com). You need four roles on it the first time you deploy from a project:

PROJECT=$(gcloud config get-value project)
SA="$(gcloud projects describe $PROJECT --format='value(projectNumber)')-compute@developer.gserviceaccount.com"

for role in \
  roles/storage.objectViewer \
  roles/cloudbuild.builds.builder \
  roles/logging.logWriter \
  roles/secretmanager.secretAccessor
do
  gcloud projects add-iam-policy-binding "$PROJECT" \
    --member="serviceAccount:$SA" --role="$role" --condition=None
done

Role	Why
`storage.objectViewer`	Cloud Build reads the source tarball it just uploaded
`cloudbuild.builds.builder`	Run builds + write logs
`logging.logWriter`	Write build logs to Cloud Logging
`secretmanager.secretAccessor`	Cloud Run mounts secrets as env vars

Skip these and you'll see errors that are hard to map

Missing storage.objectViewer shows up as 403: ... does not have storage.objects.get access during gcloud builds submit. Missing secretmanager.secretAccessor shows up during gcloud run deploy as Permission denied on secret: .... Both are one-time fixes — grant once per project.

CLI: `--remote` vs `--local`

agentbreeder deploy runs in one of two modes:

Remote (production) — POSTs /api/v1/deploys against $AGENTBREEDER_URL. The bearer token comes from agentbreeder login (OS keychain) or $AGENTBREEDER_API_TOKEN. Team-scoped RBAC, audit logging, and team-scoped cloud credentials are all enforced.
Local (dev / offline) — runs the deploy engine in-process. Bypasses every server-side gate; intended for laptop development and the local Docker Compose target.

# Production — talks to the API
export AGENTBREEDER_URL=https://api.example.com
agentbreeder login
agentbreeder deploy ./agent.yaml --target cloud-run    # remote mode (auto)

# Force remote even without the env var
agentbreeder deploy ./agent.yaml --target cloud-run --remote

# Force local even when AGENTBREEDER_URL is set
agentbreeder deploy ./agent.yaml --target local --local

Mode resolution: --local wins over --remote wins over the env var. Without flags or env, the CLI defaults to local mode.

--remote is what makes the team-scope RBAC gate fire — without it, a developer with shell access + cloud creds can run the deploy engine in-process and skip the gate entirely. Production environments should set $AGENTBREEDER_URL so that agentbreeder deploy defaults to the gated path.

Run the deploy

cd microlearning-ebook-agent
bash scripts/deploy_gcp.sh

The first run will print the public service URL when it's done:

════════════════════════════════════════════════════════════════
  Deployed: https://microlearning-ebook-agent-sizukgalta-uc.a.run.app
════════════════════════════════════════════════════════════════

What gets deployed

The container is built from microlearning-ebook-agent/Dockerfile.cloudrun, which bundles:

The AgentBreeder engine (engine/, registry/, api/) — so engine.prompt_resolver and engine.tool_resolver can read the agent's registered prompts + tools at startup.
The agent project (microlearning-ebook-agent/) including its agent.py, agent.yaml, local tools/ and prompts/ directories.
The runtime wrapper (engine/runtimes/templates/google_adk_server.py) which serves /health, /invoke, /stream with bearer-token auth.

# Excerpt from Dockerfile.cloudrun
COPY engine /app/engine
COPY registry /app/registry
COPY api /app/api
COPY microlearning-ebook-agent /app/agent

ENV PYTHONPATH="/app:/app/agent"
WORKDIR /app/agent
CMD exec uvicorn engine.runtimes.templates.google_adk_server:app \
    --host 0.0.0.0 --port "${PORT:-8080}"

The .gcloudignore at the repo root keeps the build context lean — without it the source tarball is 578 MB (from venvs + worktrees + node_modules); with it the upload is 22 MB.

Cloud Run defaults

The deploy script picks production-friendly defaults you can override via env:

Setting	Value	Override
Region	`us-central1`	`REGION=us-east1 bash scripts/deploy_gcp.sh`
Memory	2 GiB	edit script
CPU	2 vCPU	edit script
Min instances	0 (scale-to-zero)	—
Max instances	5	edit script
Concurrency	10	edit script
Request timeout	300 s	edit script
Public access	`--allow-unauthenticated`	auth happens at the bearer-token layer

Verify the deploy

The deploy is bearer-gated at the agent layer. Use the same AGENT_AUTH_TOKEN you put in .env (now stored in Secret Manager):

URL=$(gcloud run services describe microlearning-ebook-agent \
  --region us-central1 --format='value(status.url)')
TOKEN=$(grep '^AGENT_AUTH_TOKEN=' microlearning-ebook-agent/.env | cut -d= -f2)

# /health is open by design (Cloud Run liveness probe)
curl $URL/health
# {"status":"healthy","agent_name":"microlearning-ebook-agent","version":"0.1.0"}

# /invoke is gated
curl -X POST $URL/invoke \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"input":"What is your job?"}'
# {"output":"My job is to turn any topic the user supplies into a polished
#           microlearning ebook a learner can complete in 20-40 minutes.",
#  "session_id":"...","metadata":null}

# Without the token → 401
curl -X POST $URL/invoke -d '{"input":"x"}' -H "Content-Type: application/json"
# {"detail":"Missing bearer token"}

Production checklist

Concern	What to do
Don't use the default compute SA	Create a dedicated agent SA with the 4 roles, pass `--service-account` to `gcloud run deploy`.
Rotate `AGENT_AUTH_TOKEN`	`gcloud secrets versions add AGENT_AUTH_TOKEN --data-file=-` then redeploy (Cloud Run picks up the new version on the next revision).
Lock down `--allow-unauthenticated`	Replace with `--no-allow-unauthenticated` and put a Cloud Load Balancer + IAP in front, or call from a peer service via the metadata-server identity token.
Persistent sessions	Switch the agent's `google_adk.session_backend` from `memory` to `database` and point at Cloud SQL. The runtime wrapper already supports it.
Outputs to GCS	Set `EBOOK_OUTPUT_DIR=/tmp` in env and have a tool upload to GCS. The container filesystem is ephemeral — Cloud Run wipes it between requests.
Cold-start latency	Bump `--min-instances 1`. Cost trade-off: ~$5–10/month per always-on instance vs. ~3 s warm-up on first request.

AWS ECS Fargate — end-to-end (verified)

The GCP walkthrough above is mirrored here for AWS ECS Fargate, the full-governance AWS target (sidecar + cost/tracing). This path is verified against live AWS: a feature-rich agent — NVIDIA Llama-3.1-8B, a pgvector knowledge base, Redis-backed memory, and a web-search tool — deploys, serves, answers a multi-turn chat, and tears down with zero orphaned resources.

This is the existing-account (BYO) path

This walkthrough uses the CLI deploying into an existing VPC, subnets, cluster, and execution role (named in deploy.env_vars below) — the data tier (RDS + ElastiCache) is the only thing auto-provisioned. If you're starting from a fresh AWS account with no networking, use the greenfield Studio wizard instead, which creates the VPC, NAT, cluster, and IAM for you.

What makes ECS Fargate special

Declare a knowledge base or memory: without a backend_url and the deployer auto-provisions the managed data stores (RDS pgvector, ElastiCache Redis) into your existing VPC at deploy time, wires the connection env in, and removes them again on teardown. You write the agent; the data tier is a side effect of deploying.

The agent

# agent.yaml
name: nvidia-support-agent
version: 0.1.0
framework: langgraph

model:
  primary: nvidia/meta/llama-3.1-8b-instruct   # called directly from AWS
  temperature: 0.3
  max_tokens: 1024

prompts:
  system: prompts/support-system               # registry-resolved at build

knowledge_bases:
  - ref: kb/validation-docs                     # → auto-provisions RDS pgvector
memory:
  backend: redis                                # → auto-provisions ElastiCache
  stores: [session-buffer]
tools:
  - ref: tools/web_search                       # first-party tool

deploy:
  cloud: aws
  runtime: ecs-fargate
  region: us-east-1
  scaling: { min: 1, max: 1, target_cpu: 70 }
  resources:
    cpu: "512"        # Fargate task CPU units — see note below
    memory: "1024"    # Fargate task memory, MiB
  env_vars:
    # BYO network the agent + its data stores join (existing infra):
    AWS_ACCOUNT_ID: "<account-id>"
    AWS_REGION: us-east-1
    AWS_VPC_SUBNETS: "subnet-aaaa,subnet-bbbb"
    AWS_SECURITY_GROUPS: "sg-agent"
    AWS_ECS_CLUSTER: agentbreeder-validation
    AWS_EXECUTION_ROLE_ARN: "arn:aws:iam::<account-id>:role/<exec-role>"
    AWS_TASK_ROLE_ARN: "arn:aws:iam::<account-id>:role/<task-role>"
    NVIDIA_API_KEY: "<nvapi-...>"               # injected at deploy
    LOG_LEVEL: info

CPU/memory notation is normalized for you

resources.cpu / resources.memory accept either vCPU/Gi notation (cpu: "1", memory: "2Gi" — what the rest of the docs use) or raw Fargate task-size units (cpu: "512", memory: "1024" MiB). The deployer converts vCPU→CPU units (1 vCPU = 1024) and Gi→MiB before registering the task definition. The resulting pair must still be a valid Fargate combination (e.g. cpu: "0.5" + memory: "1Gi").

Prerequisites

# 1. AWS credentials (default chain — env, ~/.aws/credentials, or instance profile)
aws sts get-caller-identity

# 2. A running local Docker daemon — the deployer builds + pushes the image
docker info >/dev/null    # must succeed

# 3. NVIDIA_API_KEY (or your model provider's key) available to the deploy

Plus the BYO AWS infra referenced in env_vars:

Resource	Requirement
VPC + subnets	≥2 subnets across AZs; public IP is assigned to the task ENI so the agent is reachable.
Agent security group	Allows the agent's port (8080) from your caller / load balancer.
ECS cluster	An existing cluster named in `AWS_ECS_CLUSTER`.
Execution role	Has `AmazonECSTaskExecutionRolePolicy` (ECR pull + CloudWatch Logs `CreateLogStream`/`PutLogEvents`). No extra `logs:CreateLogGroup` permission is needed.
CloudWatch log group	The deployer pre-creates `/agentbreeder/<agent>` with your operator credentials, so the stock execution role above is sufficient — the task itself never calls `CreateLogGroup`.

Run the deploy

cd nvidia-support-agent
agentbreeder deploy agent.yaml --target ecs-fargate

The deploy runs the standard 8-stage pipeline and prints the live endpoint:

╭────────────────────────────── Deployed ──────────────────────────────╮
│ Deploy successful!                                                    │
│   Agent:    nvidia-support-agent                                      │
│   Version:  0.1.0                                                     │
│   Endpoint: http://13.220.147.220:8080                               │
╰───────────────────────────────────────────────────────────────────────╯

The endpoint is the public IP of the running task's ENI (ECS Fargate has no stable DNS without an ALB, so the deployer reads the ENI's public IP and returns http://<public-ip>:8080). For a stable hostname + TLS, front the service with an Application Load Balancer.

Stage	Asset created	Notes
Build container	Local Docker image	Bundles `engine/` + the agent so registry-ref prompts/tools/RAG resolve at runtime.
Provision infra	ECR repository	`agentbreeder-<agent>`; image pushed here.
Provision infra	RDS pgvector (`db.t3.micro`)	Auto-provisioned for the knowledge base; private, encrypted; `KB_PGVECTOR_DSN` injected. DB password lives only in Secrets Manager.
Provision infra	ElastiCache Redis (`cache.t3.micro`)	Auto-provisioned for `memory.backend: redis`; `REDIS_URL` injected.
Provision infra	Dedicated security groups	One each for Postgres (5432) and Redis (6379), reachable from the agent SG only — never the internet.
Deploy	ECS task definition	Fargate; awslogs → `/agentbreeder/<agent>`; secrets + env wired in.
Deploy	ECS service	`assignPublicIp=ENABLED`; waits for the task to reach steady state.
Return endpoint	`http://<public-ip>:8080`	Resolved from the task ENI.

Each data store is tagged AgentBreeder=true so teardown can remove exactly what the deploy created and never touch your VPC, cluster, or roles.

Verify + chat

URL=http://13.220.147.220:8080

# /health is open (ECS health probe)
curl $URL/health
# {"status":"healthy","agent_name":"nvidia-support-agent","version":"0.1.0"}

# Chat — input is a message object; the response carries the model + tokens
curl -X POST $URL/invoke -H 'Content-Type: application/json' \
  -d '{"input":{"message":"Hi! What can you help me with?"}}'
# {"output":{"messages":[{"content":"What can I assist you with today?",
#   "response_metadata":{"model_name":"meta/llama-3.1-8b-instruct", ...}}]},
#  "thread_id":"6662..."}

# Multi-turn — pass the returned thread_id to use the Redis-backed memory
curl -X POST $URL/invoke -H 'Content-Type: application/json' \
  -d '{"input":{"message":"and what was my last question?"},"thread_id":"6662..."}'

Teardown (no orphans, no leaks)

Teardown is two complementary moves — compute/registry, then the auto-provisioned data tier:

# 1. Remove the ECS service, task definitions, and ECR repository
agentbreeder teardown --cloud aws --region us-east-1 --agent nvidia-support-agent

# 2. Remove the auto-provisioned data backends recorded in .agentbreeder/infra-state.json
#    (RDS + ElastiCache + their dedicated SGs, subnet groups, and the DB-password
#    secret) — ephemeral stores are destroyed fully, with no final snapshot to bill.
agentbreeder teardown nvidia-support-agent

Both are tag-gated: teardown refuses to delete anything not carrying AgentBreeder=true, so a drifted state file can never widen the blast radius to your own VPC or cluster.

Troubleshooting

Symptom	Cause + fix
`Connection refused` fetching Docker API version	Local Docker daemon isn't running. Start Docker Desktop (`open -a Docker`) and retry. On multi-user macs the deployer prefers your own `~/.docker/run/docker.sock` over a root-owned `/var/run/docker.sock` symlink.
`Credentials store docker-credential-gcloud exited`	A registry cred-helper in `~/.docker/config.json` is interfering. The deployer authenticates to ECR explicitly; run with an isolated `DOCKER_CONFIG` pointing at a dir whose `config.json` is `{"auths":{}}`.
`Invalid 'cpu' setting for task`	The normalized `cpu`/`memory` pair isn't a valid Fargate combination. Both `cpu: "1"` (vCPU) and `cpu: "1024"` (units) are accepted — pick a memory value the chosen CPU size supports.
Task stuck `PENDING` → `STOPPED`, `ResourceInitializationError … logs:CreateLogGroup … AccessDenied`	Pre-2.x deployers relied on the task creating its own log group. Current deployers pre-create `/agentbreeder/<agent>` for you — upgrade `agentbreeder`, or as a stopgap pre-create the group manually.
`ServiceNotActiveException: Service was not ACTIVE`	A prior failed deploy left the service `DRAINING`/`INACTIVE`. Delete it (`aws ecs delete-service --force`) and redeploy.

Other deploy targets

The same agent.yaml deploys to other clouds with one-line changes:

# Cloud Run (default for GCP)
deploy:
  cloud: gcp
  runtime: cloud-run

# AWS ECS Fargate
deploy:
  cloud: aws
  runtime: ecs-fargate
  region: us-east-1

# AWS App Runner (zero-config alternative)
deploy:
  cloud: aws
  runtime: app-runner

# Azure Container Apps
deploy:
  cloud: azure
  runtime: container-apps

# Kubernetes (any conformant cluster)
deploy:
  cloud: kubernetes
  runtime: deployment

# Claude Managed Agents (Anthropic-hosted)
deploy:
  cloud: claude-managed

agentbreeder deploy reads the cloud + runtime fields and dispatches to the matching deployer in engine/deployers/. Each deployer has the same contract (build → push → provision → register), so the rest of the lifecycle is identical across targets.

See Architecture → Deploy Pipeline for the full flow.

Prerequisites per target (as of 2026-05-18)

Status reminder (v2.6.0): Local, AWS ECS Fargate, GCP Cloud Run, and Azure Container Apps deploy with full governance parity (sidecar + secret mirroring). Greenfield provisioning (creating the VPC/cluster/IAM from scratch) ships through the Studio deploy wizard today; the CLI --provision flag is on the roadmap, so CLI deploys currently require existing infra (plus auto-provisioned data backends). AWS App Runner is single-container — it deploys without a sidecar and rejects guardrails: / secrets: at validation. Kubernetes and Claude-managed are in flight (Kubernetes needs an existing cluster). This page is the canonical home of the per-target prerequisites; for the at-a-glance feature matrix see the deploy-target status table.

AWS ECS Fargate (`--target ecs-fargate`)

Full governance (sidecar + cost/tracing). For a fresh account, greenfield provisioning creates the ECS footprint from scratch via the Studio wizard (see the Greenfield mode AWS tab above; CLI --provision is on the roadmap). To deploy into existing infra from the CLI:

AWS account ID and access credentials (Secrets Manager–backed AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY or an instance profile).
VPC with at least two subnets across different AZs (private subnets recommended).
Security group permitting the agent's listening port from your front-end / load balancer.
ECS task execution IAM role with AmazonECSTaskExecutionRolePolicy.
(Optional) ECR repository for image storage; AgentBreeder will use a public registry if none is configured.
Region matches your deploy.region in agent.yaml.
Secrets: AWS auto-mirror is on the roadmap — pre-create entries in AWS Secrets Manager and reference them by name in deploy.secrets:.

AWS App Runner (`--target app-runner`)

Single-container target. App Runner deploys without a sidecar, so it rejects guardrails: and secrets: at validate-infra. Use it for simple, zero-governance services; pick ECS Fargate when you need the sidecar.

AWS account + credentials (same as ECS Fargate above).
App Runner-permitted IAM role (AWSAppRunnerServicePolicyForECRAccess if pulling from ECR).
Region selected in agent.yaml must support App Runner.
No VPC required if using public access; for VPC connector mode, an existing VPC connector resource.

Azure Container Apps (`--target container-apps`)

Full governance (sidecar + cost/tracing) and secret auto-mirror to Azure Key Vault. For a fresh subscription, greenfield provisioning creates the Container Apps footprint from scratch via the Studio wizard (see the Greenfield mode Azure tab above; CLI --provision is on the roadmap). To deploy into existing infra from the CLI:

Azure subscription ID and service-principal credentials (AZURE_CLIENT_ID, AZURE_CLIENT_SECRET, AZURE_TENANT_ID).
Resource group, pre-created — AgentBreeder will not create it for you.
Container Apps Environment within that resource group.
Container Apps-permitted role assignment for the service principal.
(Optional) managed identity if the agent needs cross-resource access.

Kubernetes (`--target kubernetes`, EKS/GKE/AKS/self-hosted) — in flight

Existing cluster (no provisioning) with kubectl access from your dev or CI environment.
~/.kube/config or KUBECONFIG pointing to the cluster.
Namespace pre-created (default: agentbreeder).
RBAC: a service account with permission to create Deployments, Services, ConfigMaps, and Secrets in that namespace.
(Optional) an ingress controller and DNS records if you want external access.

Claude Managed Agents (`--target claude-managed`) — in flight

Anthropic API key with managed-agents access enabled on your account.
No infrastructure to provision — Anthropic hosts the runtime.
Region and network constraints are dictated by Anthropic, not AgentBreeder.

Deploying from Studio

Studio's deploy wizard (Studio → Deploys → New deploy, route /deploy-wizard) is the graphical home of greenfield provisioning — and it handles existing accounts too. It is a 5-step flow:

Agent — pick a registered agent. The wizard auto-detects whether the agent requires approval (access.require_approval: true).
Target — choose cloud (AWS · ECS Fargate, GCP · Cloud Run, Azure · Container Apps) and region. A per-region cost estimate is shown inline (static client-side table, ±10%).
Infra — this is where you choose your starting point:
- Bring your own infrastructure — enter your existing account fields (mode=simple or full from the contract above) and click Validate infrastructure. The wizard calls POST /api/v1/deployments/validate-infra (read-only) and shows a per-resource ✅/❌ checklist. You can't advance until validation passes.
- Provision for me (greenfield, BETA) — a ResourcePreviewTree lists every resource AgentBreeder will create with a per-line and total monthly cost estimate. You must tick "I understand this creates cloud resources billable to my account" before advancing.
Config — env vars, secret references, scaling (min / max / target CPU %), and DB tier (shown only when the agent declares memory:).
Live deploy — submits POST /api/v1/deployments (with infra_mode set to byo or provision) and opens an SSE stream (GET /api/v1/deployments/{job_id}/stream) with six phase indicators: provisioning → building → pushing → deploying → health_checking → registering. On a greenfield run the provisioning phase streams each resource (VPC → subnets → NAT → cluster → IAM → RDS/ALB) as it is created.

Provision-for-me is BETA

The greenfield path reliably provisions the infrastructure (VPC, subnets, NAT, cluster, IAM, and — when declared — RDS/ALB) and records it for rollback. The hand-off that then builds and serves the agent into that fresh footprint is still being finalized. For a guaranteed end-to-end deploy today, provision with the wizard (or pre-create infra yourself) and run the verified BYO CLI walkthrough against it.

What greenfield provisioning creates is exactly the per-cloud footprint in the Greenfield mode tables above — all tagged AgentBreeder=true so teardown can safely target only what it made.

If a greenfield deploy fails mid-provision, the wizard exposes a Roll back action that calls POST /api/v1/deployments/{job_id}/destroy-partial, which runs the provisioner's destroy() against whatever was recorded in the job's InfraState — no orphaned VPC, NAT, or RDS left billing.

Drafts auto-save to localStorage every 250ms (refresh-safe). Agents with access.require_approval: true route through the /approvals queue; the wizard polls until an admin approves, then switches to the SSE stream.

Tearing down a Studio greenfield deploy

Greenfield resources are recorded in the job's InfraState. Remove them with the wizard's Roll back action, or from the terminal with agentbreeder teardown <agent> --cloud aws --region <region> — both refuse to touch any resource missing the AgentBreeder=true tag.

How-To → Deploy to GCP Cloud Run — the recipe-style version of this page
Authentication — the auth model end-to-end (management API JWT + agent runtime bearer)
CLI Reference → agentbreeder deploy — flags + alternatives
Local Development — get to a working local build before you deploy

On this page