Tools
Tools: Claude Code: new Tasks persisting between sessions and Swarms of agents against Context Rot.
2026-02-06
0 views
admin
The 1st Problem: Context Rot ## The 2nd Problem: Context Persistence ## Previous State of the Art: the "Ralph-Wiggum Loop" ## Most Recent Approaches: Agent Swarm ## Real-World Use Case ## Recent Developments (February 2026) ## Anthropic Claude Opus 4.6 (February 4, 2026) ## OpenAI GPT-5.3-Codex (February 5, 2026) ## OpenAI Codex App for macOS (February 2, 2026) ## Comparison ## Conclusion Abstract: "Increasing input tokens impacts LLM performance", with performance degradation varying between 20-50% across models as context length increases from 10k to the first 100k+ (less than 50% of the context window for many models). See: Context Rot — Chroma Research Performance degradation chart showing Claude Sonnet 4, GPT-4.1, Qwen3-32B and Gemini 2.5 Flash on Repeated Words Task across increasing context lengths Abstract: Not only does the context window deteriorate as it fills up, but the context and especially the Todos are typically not preserved from one session to the next. This creates difficulties and overhead every time you resume work after closing the session. This is particularly problematic when the session is suddenly interrupted, leaving some tasks completed, some in progress, and others still pending to finish the requested change or feature. One of the most interesting recent solutions were the Ralph Loop style plugins, often implemented as a Skill. The basic idea, instead of solving the various tasks in succession within the main chat context, was to split the feature into tasks, write them down on a file, and then use a SCRIPT that: However, this is a sequential brute force approach: it is slow, consumes tons of tokens, and is stateless. Since each session is killed, there is no "learning" during the process. Each session is context-blind; if an agent fails, the next one starts from scratch without knowing why the previous one failed, unless that info is manually written back to a file. This approach (Ralph Loop) is available in: Another advanced implementation is found in the Codex App (for macOS only), which elevates the loop from a simple terminal process to a managed infrastructure, using Git Worktrees to isolate these loops in parallel environments. With Release 2.1.16 (January 22, 2025) and subsequent 2.1.x updates, Claude Code implemented a sophisticated transition from sequential logic (such as Ralph Loop) to a parallel swarm architecture (Agent Swarm). The three main changes are: The old "To-dos" (which were lost when the session closed) are replaced by Tasks saved to files (with status, dependencies, and broadcasts). This decouples the work from the single chat: if the session crashes, the work plan survives and can be resumed or managed by multiple sessions in parallel. The main session can cease to be the "brute force" executor. Instead, it acts as an Orchestrator: it analyzes the problem, defines the action plan and delegates the execution. The solution to Context Rot: Instead of saturating a single context window with endless attempts, errors, and logs, the Orchestrator spawns auto-generated specialized agents (via forked sub-agents). The advantage is that every agent: Furthermore, thanks to the ability to share the same Tasks across multiple sessions (via CLAUDE_CODE_TASK_LIST_ID environment variable), multiple sessions can interact with each other. For example, while one session coordinates the implementation of parallel tasks considering dependencies, another session could test completed tasks or evaluate the quality of the results. If problems are identified, another session can add new tasks with targeted corrections, that the orchestrator will add to its agent-based execution plan, while another session could document each completed feature. This has many benefits: The following is a real example from a PowerShell-based device management tool, showing Task management and Agent Swarm in action (single session). Claude Code terminal output showing 4 parallel debugger agents being spawned to analyze different problems: The first week of February 2026 saw a remarkable convergence of major releases in the agentic coding space, with both leading AI labs delivering concrete solutions to the Context Rot and persistence challenges outlined in this document. Additional Capabilities: OpenAI released GPT-5.3-Codex with focus on real-time agent control and performance. Key Innovation — Mid-turn Steering: Performance Improvements: OpenAI also released a dedicated Codex App (macOS only), moving from CLI to Application, taking a different architectural approach than Claude's autonomous swarm. Architectural Difference: Unlike Claude Code's autonomous Agent Swarm (where an AI Orchestrator decides when to spawn specialized sub-agents), the Codex App provides a "command center" for developers to manually supervise and coordinate multiple parallel AI sessions (like different sessions in different tabs). It represents a human-in-the-loop approach rather than fully autonomous orchestration. The progression from Ralph Loop (sequential, stateless, brute-force) through Agent Swarm (parallel, persistent, autonomously orchestrated) to the latest enterprise implementations reflects a concrete architectural evolution. Even if expanded context windows (e.g. 1M tokens for Opus 4.6) reduce the urgency, LLM accuracy degrades as context fills up (Context Rot), and session state is lost between runs (Context Persistence). The solutions that have emerged—task lists supported by dependency graph files, disposable sub-agents with limited context, task sharing between sessions via environment variables—although with significantly different architectural approaches between different brands, have already enabled agents to take a new leap forward. A key architectural divergence is in orchestration: Anthropic pushes toward fully autonomous swarm management (the AI decides when and what to delegate), while OpenAI's Codex App macOS still favors developer-controlled parallelism with different tooling (Git Worktrees, review UI, scheduled automations). Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse COMMAND_BLOCK:
> *Last week* we worked on managing... [9 lines] > I'm attaching a screenshot @"newimage".*
> *Other issues... [6 lines].
> If useful, use Tasks and multiple sub-agents to solve this.* Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
> *Last week* we worked on managing... [9 lines] > I'm attaching a screenshot @"newimage".*
> *Other issues... [6 lines].
> If useful, use Tasks and multiple sub-agents to solve this.* COMMAND_BLOCK:
> *Last week* we worked on managing... [9 lines] > I'm attaching a screenshot @"newimage".*
> *Other issues... [6 lines].
> If useful, use Tasks and multiple sub-agents to solve this.* CODE_BLOCK:
● I understand, there are several issues to address. From the screenshot, I can see the current UI. I will read and analyse the code. ● I have analyzed the code. I see several issues that need to be resolved: Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
● I understand, there are several issues to address. From the screenshot, I can see the current UI. I will read and analyse the code. ● I have analyzed the code. I see several issues that need to be resolved: CODE_BLOCK:
● I understand, there are several issues to address. From the screenshot, I can see the current UI. I will read and analyse the code. ● I have analyzed the code. I see several issues that need to be resolved: CODE_BLOCK:
● I create a plan to solve these problems. ● Entered plan mode Claude is now exploring and designing an implementation approach. ● Running 2 Explore agents... (ctrl+o to expand) ├─ Analyze box dialogs and layout · 10 tool uses · 29.6k tokens │ ⎿ Search: Install-Firmware.*function|function Install-Firmw... └─ Analyze installation flow and ESC · 10 tool uses · 21.4k tokens ⎿ Read: dlota.ps1 ● Ok. I create a plan to solve these problems. ● Launch agents in parallel to analyze each problem. ● Running 4 debugging-toolkit:debugger agents... (ctrl+o to expand) Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
● I create a plan to solve these problems. ● Entered plan mode Claude is now exploring and designing an implementation approach. ● Running 2 Explore agents... (ctrl+o to expand) ├─ Analyze box dialogs and layout · 10 tool uses · 29.6k tokens │ ⎿ Search: Install-Firmware.*function|function Install-Firmw... └─ Analyze installation flow and ESC · 10 tool uses · 21.4k tokens ⎿ Read: dlota.ps1 ● Ok. I create a plan to solve these problems. ● Launch agents in parallel to analyze each problem. ● Running 4 debugging-toolkit:debugger agents... (ctrl+o to expand) CODE_BLOCK:
● I create a plan to solve these problems. ● Entered plan mode Claude is now exploring and designing an implementation approach. ● Running 2 Explore agents... (ctrl+o to expand) ├─ Analyze box dialogs and layout · 10 tool uses · 29.6k tokens │ ⎿ Search: Install-Firmware.*function|function Install-Firmw... └─ Analyze installation flow and ESC · 10 tool uses · 21.4k tokens ⎿ Read: dlota.ps1 ● Ok. I create a plan to solve these problems. ● Launch agents in parallel to analyze each problem. ● Running 4 debugging-toolkit:debugger agents... (ctrl+o to expand) - takes the first task
- launches a new ephemeral session of the LLM to execute that precise task (with a fresh context window each time, but having to re-analyze the whole project context and instructions every time)
- kills the LLM session
- checks whether the solution was correct If incorrect: restart a new ephemeral LLM session to retry the same task. Since each time you ask an LLM for a solution, it will provide a different solution, it will retry until the solution provided meets all requirements. If correct: begin a new LLM session for the next task. Continue this process until all the tasks are complete.
- If incorrect: restart a new ephemeral LLM session to retry the same task. Since each time you ask an LLM for a solution, it will provide a different solution, it will retry until the solution provided meets all requirements.
- If correct: begin a new LLM session for the next task. Continue this process until all the tasks are complete. - If incorrect: restart a new ephemeral LLM session to retry the same task. Since each time you ask an LLM for a solution, it will provide a different solution, it will retry until the solution provided meets all requirements.
- If correct: begin a new LLM session for the next task. Continue this process until all the tasks are complete. - Claude Code as a Skill (official plugin ralph-wiggum@claude-plugins-official)
- GitHub Copilot CLI (as a pattern/workflow implementation)
- OpenAI Codex CLI (supports agent loops and can implement Ralph-style patterns)
- Several community implementations (Goose, ralph-claude-code, Oh-My-Claude, etc.) - The old "To-dos" (which were lost when the session closed) are replaced by Tasks saved to files (with status, dependencies, and broadcasts). This decouples the work from the single chat: if the session crashes, the work plan survives and can be resumed or managed by multiple sessions in parallel.
- The main session can cease to be the "brute force" executor. Instead, it acts as an Orchestrator: it analyzes the problem, defines the action plan and delegates the execution.
- The solution to Context Rot: Instead of saturating a single context window with endless attempts, errors, and logs, the Orchestrator spawns auto-generated specialized agents (via forked sub-agents). - Is born into a clean, but specific context (or in a forked context with minimum overhead)
- Only loads the files necessary for its specific task
- Can use only the most appropriate skill(s) for that specific task (Architecture, UI, debugging, testing, etc.)
- Can independently use the most suitable model (e.g., Haiku for speed and simple tasks, Sonnet or Opus for complex tasks and reasoning)
- Dies at the end of the task, returning only the clean result to the Orchestrator (or the notification that its task is complete) - The main chat remains "light" and does not lose clarity
- The development can be faster, with multiple parallel agents and eventually multiple parallel sessions
- Complex details are resolved in isolated, disposable environments, but with context windows that are always clean, where LLMs give the best results. - 1M token context window (beta): First time for Opus-class models. Can process ~750,000 words (equivalent to 10-15 scientific papers) without performance degradation
- 76% score on MRCR v2 benchmark (8-needle retrieval test across 1M tokens): remarkable qualitative shift in very long-context reliability
- Context Compaction API: Automatic summarization of older conversation segments when approaching context limits, enabling effectively infinite conversations without performance collapse - Agent Teams (TeammateTool): 13 dedicated operations for autonomous swarm management
- Automatic task delegation, inter-agent messaging, and specialized agent spawning
- Full integration with Task Management System (introduced in Claude Code 2.1.16)
- The AI Orchestrator autonomously decides when to spawn specialized sub-agents based on task analysis - Adaptive Thinking: 4 effort levels (low/medium/high/max) where Claude dynamically adjusts reasoning depth based on task complexity
- 128K output tokens: Can generate substantially longer outputs - Users can intervene while the agent is working without losing context. Like redirecting a colleague mid-task rather than waiting for completion. Enables iterative refinement during long-running tasks. - 25% faster than GPT-5.2-Codex
- State-of-the-art on SWE-Bench Pro and Terminal-Bench 2.0 - Manual parallel session management: Developer explicitly assigns tasks to different threads
- Git Worktrees isolation: Each session operates on an isolated code copy
- Automations & Skills: Background scheduling and reusable capabilities. For example, each morning a session can pull all the bugs reported during the night and make a plan for solving them.
- Review tools: UI for managing diffs, staging changes across parallel sessions
how-totutorialguidedev.toaiopenaillmgptshellgitgithub