Layer 1: core (system prompt + tools) → never changes Layer 2: memory (retrieved memories) → rarely changes Layer 3: conversation (three-zone history) → changes every turn CODE_BLOCK: Layer 1: core (system prompt + tools) → never changes Layer 2: memory (retrieved memories) → rarely changes Layer 3: conversation (three-zone history) → changes every turn CODE_BLOCK: Layer 1: core (system prompt + tools) → never changes Layer 2: memory (retrieved memories) → rarely changes Layer 3: conversation (three-zone history) → changes every turn COMMAND_BLOCK: git clone https://github.com/terminus-labs-ai/sr2.git cd sr2 pip install -e . COMMAND_BLOCK: git clone https://github.com/terminus-labs-ai/sr2.git cd sr2 pip install -e . COMMAND_BLOCK: git clone https://github.com/terminus-labs-ai/sr2.git cd sr2 pip install -e .
- Your agent gets a user message (or a heartbeat, or whatever trigger)
- The framework builds a context: system prompt + tools + conversation history + retrieved memories
- That context gets sent to the LLM
- LLM responds, maybe calls a tool
- Tool result gets appended to the conversation
- Go to step 1
- Context grows unbounded and eventually gets truncated destructively
- Stale tool outputs* waste tokens every turn
- Naive context assembly destroys your cache efficiency
- Raw — the last N turns, kept completely verbatim. Recent context needs to be exact.
- Compacted — older turns where the big stuff (tool outputs, file reads, search results) gets compressed down to a reference. "→ 200 lines. Sample: [first 3 lines]... Recovery: Re-fetch with read_file." The agent can always get the original back if it needs it. Nothing is lost permanently.
- Summarized — the oldest context, where an LLM digests everything down to what actually matters: decisions that were made, things that are still unresolved, preferences the user expressed. Routine "ok done" confirmations and dead-end explorations get dropped.