Tools: How We Ran 28 AI Agents on a Single Server (And What Broke) (2026)

Tools: How We Ran 28 AI Agents on a Single Server (And What Broke) (2026)

The Setup

The Disaster Timeline

1. Memory Overflow — Week 2

2. The Correction Loop Problem — Week 3

3. The Coordination Deadlock — Week 5

4. The Context Window Tax — Week 6

5. The Trust Problem — Ongoing

What Actually Works Well

Semantic Team Memory (Teamind)

Fleet Learning

Goal Decomposition

The Numbers

Should You Do This? Posted by the Corellis team — we're the folks behind CorellisOrg/corellis, an open-source multi-agent coordination layer. This post documents 8 weeks of running it in production. We started an experiment back in February 2026: what if every single human in the company had their own AI assistant, and those assistants could talk to each other? Eight weeks in, we have 28 agents running on a single commodity hardware box. They handle operational toil, marketing coordination, release approvals, and weekly reports. So far, they've processed 50,000+ Slack messages and made 500+ self-corrections. Here's what actually worked, what blew up, and why we eventually open-sourced the whole mess. Each agent is isolated in its own Docker container: A "Controller" agent acts as the brain — assigning goals, watching the fleet, and synthesizing lessons across agents. Every agent keeps a MEMORY.md file as long-term storage. By week 2, some files had grown to 20KB+, stuffed with duplicate task notes, outdated fragments, and random chatter. The symptom: Agents started hallucinating about completed tasks. They'd reference decisions that had been overturned a month ago as gospel. The fix: A strict memory hierarchy with aggressive pruning. The core insight: not everything deserves to be remembered. Context is expensive, and context decay is real. We split memory into four tiers: We built a self-improvement system: when an agent gets corrected, it logs the lesson and adjusts its behavior. In practice, agents were recording every correction, including contradictory ones. Agent A would get corrected by the CEO to "use formal tone," then an hour later get corrected by the CTO to "be more casual." Result: a confused agent outputting garbage. The fix: Corrections go through a promotion pipeline. We call this Fleet Learning. Once one agent gets corrected on a specific mistake, the rest get smarter immediately. We gave the controller a simple goal: "Launch a user acquisition campaign." It broke it down into 6 sub-tasks and assigned them to 6 agents. Two of those agents needed to coordinate on an API design. Neither would start. Both were politely waiting for the other to finish their spec. A classic deadlock — but it happened in natural language, which makes it way harder to detect from the outside. The symptom: Two agents trading "I'm waiting for your specs" messages back and forth for 3 days straight. The fix: We built GoalOps — a structured protocol for goal decomposition. With 28 agents hitting LLM APIs constantly, we were burning through tokens. Each agent's system prompt was 3–4K tokens — personality, hardcoded rules, memory snippets, and company knowledge loaded every single call. The cost: ~$2,400/month, a huge chunk wasted on re-sending the same context. Result: API costs dropped to ~$800/month. Still not cheap, but sustainable. The hardest part isn't code; it's defining who's in charge. Some tasks need human approval (deployments, financial decisions). Some don't (formatting a report, finding docs). We run a 3-tier system: The challenge? Every human has a different risk tolerance. Our marketing lead wants full autonomy for social posts. The ops lead wants approval on every deployment. We're still tuning this friction. Every Slack message is indexed with embeddings. Any agent can ask "what did we decide about pricing last month?" and get an accurate answer with source links. This alone justifies the whole setup. No more "I think someone mentioned this in a meeting I wasn't at." When one agent gets burned by a common mistake — like using a deprecated API — the correction propagates instantly. Over 8 weeks, we went from 500 individual corrections to 47 fleet-wide rules. New agents get these for free and effectively start their life smarter than day one. You give the controller: "Prepare the quarterly report." It figures out which agents have relevant data, creates sub-tasks with dependencies, monitors progress, and merges the outputs. This used to take a human PM 2–3 hours of coordination. Now it's one sentence and ~45 minutes of agent-toil. Not instant, but consistent. Probably not yet. Unless you have: If you're curious, we open-sourced everything: CorellisOrg/corellis on GitHub. MIT license, runs on Docker, works with any OpenClaw setup. The future isn't just one AI assistant. It's a team of them that learns together. Questions? The lobsters are listening. 🦞 Templates let you quickly answer FAQs or store snippets for re-use. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse

Code Block

Copy

┌─────────────────────────────────────────────┐ │ Single Server (64GB RAM, no GPU) │ │ │ │ 🎛️ Controller │ │ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ │ │ │Alice │ │Bob │ │Carol │ │...×24│ │ │ │Mktg │ │Ops │ │Fin │ │ │ │ │ └──┬───┘ └──┬───┘ └──┬───┘ └──┬───┘ │ │ └─────────┴─────────┴────────┘ │ │ Shared knowledge + team memory │ └─────────────────────────────────────────────┘ ┌─────────────────────────────────────────────┐ │ Single Server (64GB RAM, no GPU) │ │ │ │ 🎛️ Controller │ │ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ │ │ │Alice │ │Bob │ │Carol │ │...×24│ │ │ │Mktg │ │Ops │ │Fin │ │ │ │ │ └──┬───┘ └──┬───┘ └──┬───┘ └──┬───┘ │ │ └─────────┴─────────┴────────┘ │ │ Shared knowledge + team memory │ └─────────────────────────────────────────────┘ ┌─────────────────────────────────────────────┐ │ Single Server (64GB RAM, no GPU) │ │ │ │ 🎛️ Controller │ │ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ │ │ │Alice │ │Bob │ │Carol │ │...×24│ │ │ │Mktg │ │Ops │ │Fin │ │ │ │ │ └──┬───┘ └──┬───┘ └──┬───┘ └──┬───┘ │ │ └─────────┴─────────┴────────┘ │ │ Shared knowledge + team memory │ └─────────────────────────────────────────────┘ - Private memory: Conversation history persists locally. - Shared context: Read-only access to a company Knowledge Base. - Interface: Slack channels are the primary UI. - Tasking: Connected to Notion for structured input/output. - Personal (per agent): Capped at 5KB. Auto-pruned weekly. - Member (per human): Correction history and preferences for each team member the agent works with. - Channel (per topic): Indexed with embeddings for semantic search. - Company (shared): Vetted knowledge that never expires. - Agent records locally in .learnings/corrections.md. - Same correction happens twice → promote to the agent's "Core Rules." - Applies fleet-wide → promote to shared rules. - Contradictions are flagged for a human to resolve. - Goals have explicit dependencies. - Agents must declare what they're blocked on. - The controller monitors for stalls (no progress log in 24h → escalate). - P2P handoffs have timeouts. - Moved to on-demand loading of shared knowledge. - Built a semantic search tool (we call it Teamind) instead of context stuffing. Agents search an index of past conversations rather than loading everything into the prompt. - Pruned system prompts to essentials (<2K tokens). - Auto-execute: Low-risk, reversible. - Notify-and-proceed: Medium-risk, human gets a ping. - Wait-for-approval: High-risk, blocks until human clicks "Yes." - A team that already uses AI assistants. - Repetitive coordination overhead you want to eliminate. - A high tolerance for debugging agents that misunderstand each other.