name: myapp
runtime: python3.11 build: commands: - python3 -m venv .venv - . .venv/bin/activate && python -m pip install -r requirements.txt run: command: . .venv/bin/activate && uvicorn app:main --host 0.0.0.0 --port $PORT port: 8000 resources: memory: 256M cpu: 0.5 health: path: /health interval: 10s timeout: 3s retries: 3
name: myapp
runtime: python3.11 build: commands: - python3 -m venv .venv - . .venv/bin/activate && python -m pip install -r requirements.txt run: command: . .venv/bin/activate && uvicorn app:main --host 0.0.0.0 --port $PORT port: 8000 resources: memory: 256M cpu: 0.5 health: path: /health interval: 10s timeout: 3s retries: 3
name: myapp
runtime: python3.11 build: commands: - python3 -m venv .venv - . .venv/bin/activate && python -m pip install -r requirements.txt run: command: . .venv/bin/activate && uvicorn app:main --host 0.0.0.0 --port $PORT port: 8000 resources: memory: 256M cpu: 0.5 health: path: /health interval: 10s timeout: 3s retries: 3
resource "aws_security_group" "worker" { name = "forge-worker" vpc_id = aws_vpc.forge.id ingress { description = "Application ports from Caddy/control plane" from_port = 20000 to_port = 39999 protocol = "tcp" security_groups = [aws_security_group.control_plane.id] }
}
resource "aws_security_group" "worker" { name = "forge-worker" vpc_id = aws_vpc.forge.id ingress { description = "Application ports from Caddy/control plane" from_port = 20000 to_port = 39999 protocol = "tcp" security_groups = [aws_security_group.control_plane.id] }
}
resource "aws_security_group" "worker" { name = "forge-worker" vpc_id = aws_vpc.forge.id ingress { description = "Application ports from Caddy/control plane" from_port = 20000 to_port = 39999 protocol = "tcp" security_groups = [aws_security_group.control_plane.id] }
}
POST /api/v1/webhook/github
POST /api/v1/webhook/github
POST /api/v1/webhook/github
- alert: ForgeNoOnlineAgents expr: forge_agents_online == 0 for: 2m labels: severity: page
- alert: ForgeNoOnlineAgents expr: forge_agents_online == 0 for: 2m labels: severity: page
- alert: ForgeNoOnlineAgents expr: forge_agents_online == 0 for: 2m labels: severity: page - You add a forge.yaml file to your repository describing how to build and run your app.
- You configure a GitHub webhook pointing to your Forge instance.
- Every push to your allowed branch triggers a deployment Forge clones your repository.
Builds it using the commands you defined.
Starts the application process.
Runs health checks to confirm it is live.
Updates the reverse proxy so traffic reaches it at your-app.yourdomain.com.
- Forge clones your repository.
- Builds it using the commands you defined.
- Starts the application process.
- Runs health checks to confirm it is live.
- Updates the reverse proxy so traffic reaches it at your-app.yourdomain.com.
- Logs stream in real time. If the health checks fail, Forge… - Forge clones your repository.
- Builds it using the commands you defined.
- Starts the application process.
- Runs health checks to confirm it is live.
- Updates the reverse proxy so traffic reaches it at your-app.yourdomain.com. - What Forge is
- Infrastructure
- The deploy lifecycle
- The problems that shaped Forge
- Observability
- What I would do differently - The control plane accepts 80/443 from the internet, and SSH, Prometheus, Alertmanager, and Grafana only from my admin CIDR.
- The control plane API on port 8080 is only open inside the VPC.
- The worker accepts nothing from the internet - SSH, the metrics exporter on 9108, and app ports 20000–39999 are open only from the control-plane security group: - Verifies X-Hub-Signature-256 with HMAC-SHA256.
- Checks the repo and branch against allowlists.
- Validates the commit SHA format.
- Clones the repo and parses forge.yaml - unknown fields are rejected via gopkg.in/yaml.v3 with KnownFields(true), so misconfigured deploys fail before any worker task is created.
- Creates a pending deployment in SQLite.
- The scheduler picks an online worker based on available CPU and memory headroom.
- forge-agent claims a build task, and the build runner executes the build commands inside cgroups v2 limits and Linux namespaces. It is not Docker or Firecracker, but it exercises the primitives directly. In production mode, if namespace isolation is unavailable, the build fails closed.
- The agent starts the app process with $PORT injected, then health-checks http://127.0.0.1:$PORT/health.
- Once health passes, the control plane updates Caddy's route via the Admin API - without restarting the Caddy service. - forge_deployments_total{status}: deployment rows grouped by current status.
- forge_tasks_total{status}: task rows grouped by current status, across build, run, and stop tasks.
- forge_agents_online: agents with a recent heartbeat.
- forge_agent_cpu_used, forge_agent_memory_used_bytes, forge_agent_memory_capacity_bytes, forge_agent_processes, and forge_agent_last_heartbeat_seconds: worker-level metrics exported by the agent.