Sessions as Trees, Code as Clay

In my last post, I described mom — Mario’s self-managing Slack bot that installs its own tools and writes its own skills. But there’s a question I glossed over: what happens when a self-written skill breaks?

If you’re extending an agent at runtime, mistakes are inevitable. A skill with a bug. A command that hangs. A change that corrupts state. Traditional agents either crash or carry corrupted context forward. Pi does something different.

Sessions aren’t logs. They’re trees.

The Problem with Logs

Most chat systems store conversation as a flat log. Message 1, message 2, message 3, appended forever. When you need to recover from a mistake, your options are limited: clear the whole session, or manually edit the log file and hope you don’t corrupt it.

Flat logs assume conversations are linear. But agent workflows aren’t linear. You try something, it fails, you backtrack. You explore a tangent, learn something, return to the main thread. You want to test a risky operation without committing to it.

Pi’s SessionManager stores conversations as trees.

Tree Structure

Pi sessions are JSONL files where every entry has an id and parentId:

{"type":"session","version":3,"id":"abc123","cwd":"/workspace"}
{"id":"e1","parentId":null,"type":"message","role":"user","content":"..."}
{"id":"e2","parentId":"e1","type":"message","role":"assistant","content":"..."}
{"id":"e3","parentId":"e2","type":"message","role":"user","content":"..."}

The parentId links create a tree. Most of the time, the tree is just a linear chain — normal conversation. But when you branch, the tree structure enables recovery.

Branching

Pi’s SessionManager exposes createBranchedSession(leafId). You pick any point in the conversation history, and Pi forks a new branch from there.

The branch inherits everything up to the fork point. After that, it diverges. You can experiment in the branch — test a dangerous command, try a different approach, debug a broken skill. The main session remains untouched.

When the branch work is done, you have options:

Discard — the experiment failed, throw away the branch
Summarize — extract learnings into a summary, apply to main session
Continue — the branch becomes the new main line

Branch summaries get persisted as a special entry type:

{"id":"bs1","parentId":"e47","type":"branch_summary","summary":"Debugged deploy skill: config template had unescaped quotes, added YAML validation step before write"}

The summary is dense — just the learnings, not the debugging conversation. When you return to the main session, the agent sees the summary without the noise of trial and error.

This is what I called “intentional compaction” but with actual architecture supporting it. You’re not manually copying context between sessions. The tree structure makes branching a first-class operation.

Compaction as Tree Pruning

Context windows fill up. When they do, you’re in the dumb zone — the model drifts and forgets instructions. You need to drop old content. Most agents just truncate — keep the last N messages, drop everything else.

Pi’s compaction is smarter. It summarizes older conversation into a persistent compaction entry:

{"id":"c1","parentId":"e50","type":"compaction","summary":"...","firstKeptEntryId":"e51","tokensBefore":45000}

The summary isn’t just dropped context — it’s compressed knowledge that remains in the tree. Future turns see the compaction summary plus messages after firstKeptEntryId. The information is preserved, just in denser form — compression as journey, not just mechanism.

Auto-compaction triggers when:

contextTokens > contextWindow - reserveTokens

The settings are configurable:

{
  "compaction": {
    "enabled": true,
    "reserveTokens": 20000,
    "keepRecentTokens": 20000
  }
}

reserveTokens is headroom for prompts and the next response. keepRecentTokens controls how much recent context survives compaction. OpenClaw enforces a floor of 20,000 tokens for reserveTokens — enough room for multi-turn housekeeping before compaction becomes unavoidable. Set it lower and OpenClaw bumps it up.

Pre-Compaction Memory Flush

OpenClaw adds a safety net: before compaction triggers, the system runs a silent agentic turn — tackling the same persistent memory problem that Beads tried to solve with dependency graphs, but with a simpler approach.

The flush runs when context crosses a “soft threshold” — below Pi’s compaction threshold but close enough to warrant concern. The system injects a message asking the agent to persist critical state to disk. The agent writes to memory/YYYY-MM-DD.md in the workspace.

The turn is silent — NO_REPLY at the start suppresses delivery to the user. You don’t see the housekeeping. But when compaction runs, the durable state has already been saved.

Configuration:

{
  "compaction": {
    "memoryFlush": {
      "enabled": true,
      "softThresholdTokens": 4000
    }
  }
}

The flush runs once per compaction cycle. If the session stays below threshold, no flush. If it crosses threshold multiple times, one flush per cycle. The workspace file survives compaction — you don’t lose critical context just because the window filled up.

Hot-Reload

Trees make self-extension recoverable. But OpenClaw adds another layer: hot-reload for the infrastructure around Pi.

OpenClaw runs two watch systems in parallel:

Skills watcher — monitors skills/ directories using chokidar (a cross-platform file watcher), debounces changes (250ms default), bumps snapshot versions when skills change. When you edit a skill, it’s available immediately. No restart.

Config watcher — rule-based reload for configuration. Some changes can hot-reload; others require gateway restart:

Hooks and cron jobs hot-reload instantly
Browser control settings hot-reload
Gateway and plugin changes require restart
Skills have their own dedicated watcher

This separation prevents unsafe reloads while enabling fast iteration on safe changes.

Recovery Workflow

Here’s a concrete scenario. You’re working with mom on a deployment task. mom writes a new deploy skill to automate your release process. You invoke it, and the skill has a bug — it writes corrupted config to your staging environment.

With a flat-log agent, your options are bad: start over and lose all context, or try to have the agent fix its own mess while carrying the corrupted state forward. Both paths hurt.

With Pi’s tree structure, you branch from before the damage:

/branch e47 — fork from the message before the deploy
Debug in the branch — the main session stays untouched
Fix the skill, test it, verify it works
/summary — extract the learnings into a dense summary
Return to main — the summary persists, the debugging noise doesn’t

The skill is now fixed and hot-reloaded (no restart needed). You can return to main and run the deploy again. The branch gave you a safe space to recover without contaminating your primary context.

Trees give you actual recovery. Flat logs give you “start over” or “hope for the best.”

The JSONL Format

Pi’s choice of JSONL is deliberate. It’s append-only, which means writes are atomic — no risk of corrupting the file mid-write. It’s human-readable, so you can inspect sessions with standard tools. It’s line-based, so you can grep for specific content.

The tree structure lives in the parentId links, not in nested JSON. This means you can still process the file line-by-line. Tools that don’t understand trees just see a flat log. Tools that do understand trees can reconstruct the full structure.

What This Enables

Tree-structured sessions enable workflows that flat logs can’t support:

Parallel exploration. Fork multiple branches to try different approaches simultaneously. Compare results, pick the best one.

Safe experimentation. Test risky operations in branches. If they work, merge. If they fail, discard.

Debugging isolation. When something breaks, branch to debug without polluting the main context with error messages and failed attempts.

Recoverable self-extension. Agents can extend themselves knowing that mistakes are recoverable. This is what makes mom’s self-writing skills viable in practice.

The Architecture

Malleable code — agents that write and modify their own capabilities — requires architecture that makes mistakes recoverable. Pi’s session trees provide that architecture.

Flat logs assume perfect execution. Trees assume failure is normal and recovery is essential.

For self-extending agents, the second assumption is obviously correct. And once you have trees, you stop fearing experimentation. Branch, try something risky, recover if it fails. The architecture makes courage cheap.