<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.3.4">Jekyll</generator><link href="https://random.qmx.me/feed.xml" rel="self" type="application/atom+xml" /><link href="https://random.qmx.me/" rel="alternate" type="text/html" /><updated>2026-05-13T07:30:50+00:00</updated><id>https://random.qmx.me/feed.xml</id><title type="html">random::qmx</title><subtitle>thoughts by Doug Campos</subtitle><author><name>Doug Campos</name></author><entry><title type="html">Radix Trees Are Everywhere</title><link href="https://random.qmx.me/posts/2026/03/19/radix-trees-everywhere/" rel="alternate" type="text/html" title="Radix Trees Are Everywhere" /><published>2026-03-19T00:00:00+00:00</published><updated>2026-03-19T00:00:00+00:00</updated><id>https://random.qmx.me/posts/2026/03/19/radix-trees-everywhere</id><content type="html" xml:base="https://random.qmx.me/posts/2026/03/19/radix-trees-everywhere/"><![CDATA[<p>I had no clue what a radix tree was until I started digging into <a href="https://github.com/sgl-project/sglang">SGLang</a>, an inference server everyone claimed was “memory-efficient.” I ran it, read the docs, saw them throwing around “chunks” and “cache chunks” — nothing clicked.</p>

<p>Then I found <a href="https://github.com/sgl-project/mini-sglang">mini-SGLang</a>, the stripped-down version. And there, right in the code, was <strong>radix trees</strong>.</p>

<p>You know that moment when you learn something new and suddenly spot it everywhere? Like buying a specific car model and then seeing it on every street? That’s exactly what happened. Once I saw radix trees in SGLang, I started finding them in places that had absolutely nothing to do with LLMs. Turns out <strong>radix trees solve a specific class of problems so well that they keep showing up across decades and completely unrelated domains.</strong></p>

<!--more-->

<h2 id="what-is-a-radix-tree-the-basics">What Is a Radix Tree? (The Basics)</h2>

<p>A radix tree (sometimes called a compressed trie or Patricia tree) is what you get when you take a trie and stop wasting space.</p>

<p>In a regular trie, each node represents one character. “CAST” and “CASH” share the C→A→S prefix, but still need separate nodes for T and H. A standard trie uses 5 nodes total. Still wasteful.</p>

<p>A radix tree compresses single-child chains. Those two words share “CAS”, so you get one node for “CAS” and two children for “T” and “H”. Three nodes instead of five.</p>

<picture><source sizes="(max-width: 600px) 100vw, (max-width: 900px) 100vw, 800px" srcset="/assets/images/generated/radix-trees-trie-comparison-400-ea9ba743d.webp 400w, /assets/images/generated/radix-trees-trie-comparison-800-ea9ba743d.webp 800w, /assets/images/generated/radix-trees-trie-comparison-1200-ea9ba743d.webp 1200w" type="image/webp" /><source sizes="(max-width: 600px) 100vw, (max-width: 900px) 100vw, 800px" srcset="/assets/images/generated/radix-trees-trie-comparison-400-18d7ab7c6.png 400w, /assets/images/generated/radix-trees-trie-comparison-800-18d7ab7c6.png 800w, /assets/images/generated/radix-trees-trie-comparison-1200-18d7ab7c6.png 1200w" type="image/png" /><img loading="lazy" decoding="async" src="/assets/images/generated/radix-trees-trie-comparison-800-18d7ab7c6.png" alt="Side-by-side comparison of a regular trie with eight nodes versus a compressed radix tree with three nodes" /></picture>

<p><strong>What makes them useful:</strong></p>

<ul>
  <li><strong>O(k) operations</strong> where k is key length (not O(n) where n is number of keys)</li>
  <li><strong>Space compression</strong> — single-child edges get merged</li>
  <li><strong>Built-in hierarchy</strong> — prefix relationships are just there</li>
</ul>

<p>This isn’t academic. These are the data structures powering systems you use every day.</p>

<h2 id="ip-routing-where-it-all-started">IP Routing: Where It All Started</h2>

<p>The original use case: <strong>longest prefix match</strong> for packet forwarding.</p>

<p>Your routing table has entries like:</p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">10.0.0.0/8</code> → interface A</li>
  <li><code class="language-plaintext highlighter-rouge">10.1.0.0/16</code> → interface B</li>
  <li><code class="language-plaintext highlighter-rouge">10.1.2.0/24</code> → interface C</li>
</ul>

<p>When a packet arrives for <code class="language-plaintext highlighter-rouge">10.1.2.5</code>, you need the <em>most specific</em> match. Hash tables can’t do this. You need a tree.</p>

<p>Network cards implement radix trees in hardware for line-rate performance. We’re talking about a data structure from 1960, published by Fredkin <a href="https://dl.acm.org/doi/10.1145/367390.367400">before the internet existed</a>, now routing the internet’s traffic.</p>

<p><strong>I never connected the dots before.</strong> I’ve configured routing tables, worked with network equipment, and had no idea what data structure was hiding underneath. That’s the thing about good abstractions: they work so well you forget they’re there.</p>

<h2 id="redis-streams">Redis: Streams</h2>

<p>But radix trees didn’t stay in networking. They migrated into places you’d never expect.</p>

<p>Redis uses radix trees to power <a href="https://redis.io/docs/latest/develop/data-types/streams/">Streams</a>: its append-only log data structure introduced in Redis 5.0.</p>

<p>The problem Streams solve: you need an ordered, time-indexed log that supports <strong>range queries</strong> by ID and efficient consumer group tracking. Hash tables can’t give you ordered iteration. Skip lists (what sorted sets use) would work but waste memory for append-mostly workloads.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>XRANGE mystream 1526985054069-0 1526985054079-0  # Give me entries in this ID range
</code></pre></div></div>

<p>antirez wrote <a href="https://github.com/redis/redis/blob/unstable/src/rax.c">Rax</a>, a radix tree implementation, specifically for this. Stream entries are delta-compressed into listpack macro nodes, and those nodes are indexed by a radix tree keyed on entry IDs. Because IDs are time-based and stored in big-endian order, nearby entries share long prefixes. Exactly what radix trees compress well.</p>

<p><strong>The results are dramatic.</strong> Storing one million entries in a Stream uses ~17 MB. The same data in a sorted set + hash takes ~220 MB. That’s 13x less memory, thanks to the radix tree plus listpack combination.</p>

<p><strong>I’ve used Redis in production for years.</strong> Streams were just “another data type” to me. I never looked under the hood to see that the same data structure routing my packets was also indexing my event logs.</p>

<h2 id="ethereum-the-cryptographic-twist">Ethereum: The Cryptographic Twist</h2>

<p>Ethereum uses a <strong>Merkle Patricia Trie</strong>: a radix tree where every node is a cryptographic hash of its children.</p>

<p>This isn’t for storage. It’s for <em>proofs</em>.</p>

<p>When you want to verify that an account exists in Ethereum’s state, you don’t download the whole database. You download the path from root to leaf, and the hashes along the way. Each node’s hash is computed from its children, so if any data is tampered with, the root hash changes. The tree structure lets you prove membership without revealing everything. Light clients can verify state without running a full node.</p>

<p>The <a href="https://ethereum.github.io/yellowpaper/paper.pdf">Ethereum yellow paper</a> (section 4.3) describes this in detail. Every account, every contract storage slot, organized as a trie.</p>

<picture><source sizes="(max-width: 600px) 100vw, (max-width: 900px) 100vw, 800px" srcset="/assets/images/generated/radix-trees-ethereum-merkle-400-cb46081c7.webp 400w, /assets/images/generated/radix-trees-ethereum-merkle-800-cb46081c7.webp 800w, /assets/images/generated/radix-trees-ethereum-merkle-1200-cb46081c7.webp 1200w" type="image/webp" /><source sizes="(max-width: 600px) 100vw, (max-width: 900px) 100vw, 800px" srcset="/assets/images/generated/radix-trees-ethereum-merkle-400-c32cbc871.png 400w, /assets/images/generated/radix-trees-ethereum-merkle-800-c32cbc871.png 800w, /assets/images/generated/radix-trees-ethereum-merkle-1200-c32cbc871.png 1200w" type="image/png" /><img loading="lazy" decoding="async" src="/assets/images/generated/radix-trees-ethereum-merkle-800-c32cbc871.png" alt="Four consecutive blockchain state tries showing radix compression and hash chain inheritance" /></picture>

<p><strong>Same data structure</strong> that’s routing your packets and sorting your Redis keys, now securing billions in cryptocurrency.</p>

<h2 id="sglang-llm-inference">SGLang: LLM Inference</h2>

<p>Now we’re back to where I started.</p>

<p>SGLang uses radix trees for <strong>KV cache prefix caching</strong>. Here’s the problem:</p>

<p>When you run LLM inference, the attention mechanism needs to keep track of all previous tokens (the KV cache). If you have 100 requests with the same system prompt, you’re computing the same KV cache 100 times. Wasteful.</p>

<p>SGLang stores token sequences as keys in a radix tree:</p>
<ul>
  <li>Key: <code class="language-plaintext highlighter-rouge">[101, 203, 77, 55]</code> (the system prompt tokens)</li>
  <li>Value: pointer to the GPU memory holding the KV cache</li>
</ul>

<p>When request #42 arrives with the same system prompt, the tree finds the matching path. <strong>Zero recomputation.</strong> But here’s the kicker: radix trees also handle <strong>partial overlaps</strong>. If request #43 has <code class="language-plaintext highlighter-rouge">[101, 203, 77, 55, 88]</code> and request #44 has <code class="language-plaintext highlighter-rouge">[101, 203, 77, 55, 99]</code>, the tree splits after the fourth token. Both share the first four tokens, but diverge after.</p>

<p>For long prompts, recomputing KV cache can dominate inference time. SGLang’s prefix caching eliminates redundant work across requests.</p>

<picture><source sizes="(max-width: 600px) 100vw, (max-width: 900px) 100vw, 800px" srcset="/assets/images/generated/radix-trees-sglang-cache-400-78ba0c01c.webp 400w, /assets/images/generated/radix-trees-sglang-cache-800-78ba0c01c.webp 800w, /assets/images/generated/radix-trees-sglang-cache-1200-78ba0c01c.webp 1200w" type="image/webp" /><source sizes="(max-width: 600px) 100vw, (max-width: 900px) 100vw, 800px" srcset="/assets/images/generated/radix-trees-sglang-cache-400-f4798eec2.png 400w, /assets/images/generated/radix-trees-sglang-cache-800-f4798eec2.png 800w, /assets/images/generated/radix-trees-sglang-cache-1200-f4798eec2.png 1200w" type="image/png" /><img loading="lazy" decoding="async" src="/assets/images/generated/radix-trees-sglang-cache-800-f4798eec2.png" alt="Left-to-right flow showing documents transforming into tokens flowing into a shared radix tree trunk" /></picture>

<p>Hash tables can’t do this. Earlier approaches like vLLM’s block-level hashing (16-token blocks) require exact block alignment. SGLang’s token-level radix tree catches sharing opportunities that block hashing misses.</p>

<p><strong>I saw this firsthand.</strong> Running SGLang with multiple concurrent requests, all sharing common prompts. The throughput jumped. GPU memory usage dropped. The radix tree was doing the heavy lifting. The <a href="https://arxiv.org/abs/2312.07104">RadixAttention paper</a> has the benchmarks.</p>

<h2 id="the-common-thread-hierarchical-prefix-matching">The Common Thread: Hierarchical Prefix Matching</h2>

<p>So what do IP routing, Redis Streams, Ethereum state, and LLM inference have in common?</p>

<ul>
  <li><strong>Hierarchical data</strong> (IP prefixes, time-based entry IDs, account states, token sequences)</li>
  <li><strong>Variable-length prefixes</strong> (not fixed blocks)</li>
  <li><strong>Need for prefix operations</strong> (longest match, range queries, membership proofs, cache sharing)</li>
</ul>

<p>Radix trees solve all of these naturally. The structure <em>is</em> the solution.</p>

<p><strong>This is why the pattern keeps winning.</strong> Engineers aren’t lazy and keep copying code. Radix trees are just the right tool for this specific class of problems. When you have hierarchical data with shared prefixes, you’ll find yourself reaching for a radix tree.</p>

<h2 id="the-takeaway">The Takeaway</h2>

<p>Don’t take away “learn radix trees.” Take away <strong>learn to recognize patterns</strong>.</p>

<p>I didn’t connect SGLang to IP routing. I didn’t see the connection between Redis Streams and Ethereum state. These were separate problems in my head until I learned the underlying data structure.</p>

<p><strong>Pattern matching is one of the best skills you can develop in software engineering.</strong></p>

<p>Bloom filters show up in distributed databases, web crawlers, and blockchain light clients. B-trees power databases, filesystems, and even Git. Skip lists are in Redis, LevelDB, and concurrency libraries.</p>

<p>When you see the same pattern in unrelated systems, you’re not seeing coincidence. You’re seeing how certain solutions just fit certain problems.</p>

<p>I was reading about “cache chunks” in SGLang docs, completely lost. Then I found what was under the hood. That’s the thing about patterns — once you see them, you can’t unsee them. Next time you’re evaluating an inference server and someone mentions “prefix caching,” you’ll know to ask what data structure they’re using. Next time you’re designing a system with hierarchical data and shared prefixes, a radix tree might be waiting for you.</p>

<p>This is what happened to me with routing tables and Redis. I used them for years without understanding what made them tick. Now I see the pattern everywhere. The data structure you just learned, powering half the systems you use — that’s not a bug. It’s just good engineering.</p>]]></content><author><name>Doug Campos</name></author><category term="data-structures" /><category term="algorithms" /><category term="sglang" /><category term="redis" /><category term="ethereum" /><category term="networking" /><category term="ai" /><summary type="html"><![CDATA[I had no clue what a radix tree was until I started digging into SGLang, an inference server everyone claimed was “memory-efficient.” I ran it, read the docs, saw them throwing around “chunks” and “cache chunks” — nothing clicked. Then I found mini-SGLang, the stripped-down version. And there, right in the code, was radix trees. You know that moment when you learn something new and suddenly spot it everywhere? Like buying a specific car model and then seeing it on every street? That’s exactly what happened. Once I saw radix trees in SGLang, I started finding them in places that had absolutely nothing to do with LLMs. Turns out radix trees solve a specific class of problems so well that they keep showing up across decades and completely unrelated domains.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://random.qmx.me/assets/images/radix-trees-hero.png" /><media:content medium="image" url="https://random.qmx.me/assets/images/radix-trees-hero.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">The Second Opinion</title><link href="https://random.qmx.me/posts/2026/03/10/the-second-opinion/" rel="alternate" type="text/html" title="The Second Opinion" /><published>2026-03-10T00:00:00+00:00</published><updated>2026-03-10T00:00:00+00:00</updated><id>https://random.qmx.me/posts/2026/03/10/the-second-opinion</id><content type="html" xml:base="https://random.qmx.me/posts/2026/03/10/the-second-opinion/"><![CDATA[<p><a href="/posts/2026/02/09/how-computers-find-words/">BM25</a> matches keywords. <a href="/posts/2026/02/16/how-computers-understand-meaning/">Vector search</a> matches meaning. <a href="/posts/2026/02/26/the-art-of-combining-opinions/">Reciprocal Rank Fusion</a> combines them. But all three share a limitation: they work from preprocessed representations, not the actual content.</p>

<p>Re-ranking is different. <strong>It’s where a language model actually reads the results and judges whether they’re relevant.</strong></p>

<p>This is the piece that makes modern search feel almost magical. <strong>Retrieval finds candidates quickly but makes mistakes. Re-ranking catches those mistakes.</strong></p>

<!--more-->

<h2 id="fast-and-shallow-vs-slow-and-deep">Fast and Shallow vs Slow and Deep</h2>

<p>BM25 and vector search are fast because they use precomputed indexes. BM25 looks up terms in an inverted index. Vector search compares embeddings that were computed during indexing. Neither method reads the actual document at query time - they work from preprocessed representations.</p>

<p>This is a tradeoff. <strong>Speed requires simplification.</strong> The indexes capture something about each document, but not everything. A vector embedding compresses a document into a few hundred numbers. Nuance gets lost.</p>

<p>The result: retrieval methods sometimes return documents that seem relevant but aren’t. The embedding for “coffee machine maintenance” might be close to “coffee brewing techniques” because both involve coffee. But if you’re searching for maintenance guides, brewing techniques aren’t helpful.</p>

<p>This is where re-ranking comes in. A re-ranker actually reads the candidate documents and the query, then makes a judgment: is this document actually relevant to what the user asked?</p>

<h2 id="how-re-ranking-works">How Re-ranking Works</h2>

<p>A re-ranker is a small language model trained specifically to judge relevance. It’s not a general chatbot - it’s a specialist. Models like <code class="language-plaintext highlighter-rouge">Qwen3-Reranker</code> or <code class="language-plaintext highlighter-rouge">BGE-Reranker</code> are typically 500MB-1GB.</p>

<p>For each candidate document, the re-ranker receives:</p>
<ul>
  <li>The original query</li>
  <li>The document content (or the matching chunk)</li>
</ul>

<p>It outputs a simple judgment: yes or no, with a confidence score. “Yes, this document answers the query” or “No, it doesn’t.”</p>

<p><strong>The confidence scores let the system adjust rankings.</strong> A high-confidence “yes” boosts a document up. A high-confidence “no” pushes it down. Uncertain judgments have less effect.</p>

<p>This is fundamentally different from retrieval. BM25 and vectors compute similarity metrics - statistical measures that correlate with relevance. The re-ranker makes a semantic judgment - it understands language well enough to assess whether the content actually addresses the query.</p>

<h2 id="two-stage-retrieval">Two-Stage Retrieval</h2>

<p>You might wonder: if the re-ranker is so good at judging relevance, why not use it for everything?</p>

<p><strong>Because it’s slow.</strong></p>

<p>Running a language model takes time - even a small one. If you have 10,000 documents, running the re-ranker on all of them would take far too long. BM25 can search 10,000 documents in milliseconds. The re-ranker might take seconds per document.</p>

<p>The solution is two-stage retrieval:</p>

<ol>
  <li><strong>Stage 1 (retrieval)</strong>: Use fast methods (BM25, vectors, RRF) to find the top 50-100 candidates</li>
  <li><strong>Stage 2 (re-ranking)</strong>: Use the slow but accurate re-ranker to judge those candidates</li>
</ol>

<p>This is the classic speed-accuracy tradeoff. Stage 1 casts a wide net quickly, accepting some false positives. Stage 2 filters carefully, using more expensive computation on a much smaller set.</p>

<p><strong>The key assumption: if a document is truly relevant, stage 1 will probably find it.</strong> The retrieval methods don’t need to be perfect - they just need to not miss good results. <strong>Stage 1 handles recall; stage 2 handles precision.</strong></p>

<h2 id="position-aware-blending">Position-Aware Blending</h2>

<p>Sophisticated systems don’t simply replace retrieval scores with re-ranker scores. They blend them, with the blend ratio depending on position.</p>

<table>
  <thead>
    <tr>
      <th>Position</th>
      <th>Retrieval Weight</th>
      <th>Re-ranker Weight</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>1-3</td>
      <td>75%</td>
      <td>25%</td>
    </tr>
    <tr>
      <td>4-10</td>
      <td>60%</td>
      <td>40%</td>
    </tr>
    <tr>
      <td>11+</td>
      <td>40%</td>
      <td>60%</td>
    </tr>
  </tbody>
</table>

<p>Why different weights at different positions?</p>

<p>The top results from RRF are strong signals - documents that ranked highly in multiple retrieval methods. If both BM25 and vectors agree a document is relevant, it probably is. The re-ranker might disagree, but retrieval has a strong track record at the top. So the blend favors retrieval.</p>

<p>Lower positions are less certain. Maybe a document ranked #50 in BM25 but #5 in vectors. RRF puts it somewhere in the middle, but we’re not confident. Here, the re-ranker’s judgment matters more. <strong>It can rescue genuinely relevant documents that retrieval undervalued, or demote false positives that slipped through.</strong></p>

<p>This is a hedge. We don’t fully trust either signal, so we blend them. The blend shifts based on how confident we are in the retrieval signal at each position.</p>

<h2 id="the-full-pipeline">The Full Pipeline</h2>

<p>A complete hybrid search pipeline looks like this:</p>

<ol>
  <li><strong>Query expansion</strong>: Generate variations of the query using a local model</li>
  <li><strong>Parallel retrieval</strong>: Run all queries against both BM25 and vector indexes</li>
  <li><strong>RRF fusion</strong>: Combine all ranked lists into unified scores</li>
  <li><strong>Re-ranking</strong>: Run the re-ranker on top candidates</li>
  <li><strong>Position-aware blending</strong>: Combine retrieval and re-ranker scores</li>
  <li><strong>Return results</strong>: Final ranked list</li>
</ol>

<p>Each stage serves a purpose:</p>

<ul>
  <li>Query expansion catches terminology mismatches</li>
  <li>BM25 catches exact keyword matches</li>
  <li>Vectors catch semantic similarity</li>
  <li>RRF combines signals from different methods</li>
  <li>Re-ranking filters false positives and promotes true relevance</li>
  <li>Position-aware blending hedges between retrieval and re-ranker confidence</li>
</ul>

<p><strong>It’s a lot of machinery for a search command.</strong> But the result is search that handles both “ECONNREFUSED error” (exact match) and “why is networking broken” (semantic query) gracefully.</p>

<h2 id="running-locally">Running Locally</h2>

<p>All of this can run on your machine. A typical setup uses three models:</p>

<table>
  <thead>
    <tr>
      <th>Model</th>
      <th>Size</th>
      <th>Purpose</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Embedding model</td>
      <td>~300MB</td>
      <td>Creates vector embeddings</td>
    </tr>
    <tr>
      <td>Re-ranker</td>
      <td>~640MB</td>
      <td>Judges relevance</td>
    </tr>
    <tr>
      <td>Query expansion</td>
      <td>~1.1GB</td>
      <td>Generates query variations</td>
    </tr>
  </tbody>
</table>

<p>Total: about 2GB. Not tiny, but manageable. I’ve <a href="/posts/2025/10/08/small-models-big-future/">written before</a> about small models and what they can do. <strong>These are specialized tools - not general chatbots, but experts at their narrow tasks.</strong> They run on CPU if needed, though GPU acceleration helps.</p>

<p>The local-first approach matters. Your documents don’t leave your machine. There’s no API to pay for, no rate limits, no service to go down. The search is as reliable as your laptop.</p>

<p><a href="https://github.com/tobi/qmd">QMD</a> is one implementation of this pipeline. It assembles BM25, vectors, RRF, and re-ranking into a coherent tool that runs entirely locally. But the concepts apply broadly - any modern search system worth its salt uses some combination of these techniques.</p>

<h2 id="the-takeaway">The Takeaway</h2>

<p>Modern search is layered. Fast methods find candidates. Slow methods judge them. Fusion combines multiple signals. <strong>Each layer compensates for the limitations of the others.</strong></p>

<p>BM25 is fast but only matches words. Vectors understand meaning but can be fooled by surface similarity. RRF combines them but can’t tell if a document actually answers the question. Re-ranking can, but it’s too slow to run on everything.</p>

<p>Together, they form a pipeline that’s greater than the sum of its parts. You ask a question in natural language, and documents appear ranked by genuine relevance - not just keyword density or embedding distance, but something closer to “does this actually help?”</p>

<p>The techniques aren’t new. BM25 is from 1994. Vector search has been around for years. RRF was published in 2009. Re-ranking with language models is newer but well-established. <strong>The insight is that these old ideas, thoughtfully assembled, produce something that feels like magic.</strong></p>

<p>That’s often how useful software gets made. The components exist; someone just needs to put them together. Understanding what each piece does - and why - helps you appreciate the engineering, and maybe build something of your own.</p>]]></content><author><name>Doug Campos</name></author><category term="ai" /><category term="search" /><category term="llm" /><category term="reranking" /><category term="qmd" /><summary type="html"><![CDATA[BM25 matches keywords. Vector search matches meaning. Reciprocal Rank Fusion combines them. But all three share a limitation: they work from preprocessed representations, not the actual content. Re-ranking is different. It’s where a language model actually reads the results and judges whether they’re relevant. This is the piece that makes modern search feel almost magical. Retrieval finds candidates quickly but makes mistakes. Re-ranking catches those mistakes.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://random.qmx.me/assets/images/reranker-hero.png" /><media:content medium="image" url="https://random.qmx.me/assets/images/reranker-hero.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">One Brain, Multiple Eyes</title><link href="https://random.qmx.me/posts/2026/03/03/one-brain-multiple-eyes/" rel="alternate" type="text/html" title="One Brain, Multiple Eyes" /><published>2026-03-03T00:00:00+00:00</published><updated>2026-03-03T00:00:00+00:00</updated><id>https://random.qmx.me/posts/2026/03/03/one-brain-multiple-eyes</id><content type="html" xml:base="https://random.qmx.me/posts/2026/03/03/one-brain-multiple-eyes/"><![CDATA[<p>Pi is a single-session agent. It doesn’t care where the conversation comes from — terminal, Slack, wherever. One session, one context. <strong>That’s the whole point of keeping it minimal.</strong></p>

<p>But what if you want the same brain answering you on Telegram while you’re commuting, on Slack while you’re working, and on iMessage when your laptop is closed?</p>

<p>That’s the problem <a href="https://github.com/openclaw/openclaw">OpenClaw</a> solves. One agent, multiplexed across every channel you use.</p>

<!--more-->

<h2 id="the-multiplexer-problem">The Multiplexer Problem</h2>

<p>Pi is stateless between sessions. It reads a conversation file, generates a response, writes the updated file. The conversation could come from anywhere — a terminal, a Slack webhook, a Telegram bot.</p>

<p>But Pi doesn’t handle the routing. Something else has to receive that Telegram message, figure out which conversation it belongs to, load the right session file, run Pi, take the response, and send it back to Telegram. That’s not agent work — it’s plumbing.</p>

<p><strong>OpenClaw is the plumbing.</strong></p>

<h2 id="routing-which-agent-which-session">Routing: Which Agent, Which Session</h2>

<p>When a message arrives from any channel, OpenClaw makes two decisions: which agent handles it, and which session it continues.</p>

<p>The binding system handles the first question. You configure rules:</p>

<ul>
  <li>Telegram DMs → personal agent</li>
  <li>Work Slack → work agent</li>
  <li>Discord server → coding agent</li>
  <li>Everything else → default agent</li>
</ul>

<p>Bindings match on channel, peer (who’s talking), account (which bot account received it), guild (Discord server), or team (Slack workspace). Specific matches override general ones. A binding for a specific Telegram chat overrides the catch-all Telegram binding.</p>

<p>Threads inherit their parent’s binding. If you’re in a Slack channel bound to your work agent and someone starts a thread, that thread routes to the same agent — even though technically it has a different peer ID. The binding system checks the parent context first, then falls back to direct matching.</p>

<p>A concrete config:</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">bindings</span><span class="pi">:</span>
  <span class="pi">-</span> <span class="na">match</span><span class="pi">:</span>
      <span class="na">channel</span><span class="pi">:</span> <span class="s">telegram</span>
      <span class="na">peer</span><span class="pi">:</span>
        <span class="na">kind</span><span class="pi">:</span> <span class="s">dm</span>
        <span class="na">id</span><span class="pi">:</span> <span class="s2">"</span><span class="s">12345678"</span>  <span class="c1"># Specific contact</span>
    <span class="na">agentId</span><span class="pi">:</span> <span class="s">personal</span>
  <span class="pi">-</span> <span class="na">match</span><span class="pi">:</span>
      <span class="na">channel</span><span class="pi">:</span> <span class="s">slack</span>
      <span class="na">teamId</span><span class="pi">:</span> <span class="s2">"</span><span class="s">T0123WORK"</span>
    <span class="na">agentId</span><span class="pi">:</span> <span class="s">work</span>
  <span class="pi">-</span> <span class="na">match</span><span class="pi">:</span>
      <span class="na">channel</span><span class="pi">:</span> <span class="s">discord</span>
      <span class="na">guildId</span><span class="pi">:</span> <span class="s2">"</span><span class="s">coding-server"</span>
    <span class="na">agentId</span><span class="pi">:</span> <span class="s">coding</span>
</code></pre></div></div>

<p>The hierarchy is evaluated top to bottom. First match wins. The final implicit rule is “everything else → default agent.”</p>

<picture><source sizes="(max-width: 600px) 100vw, (max-width: 900px) 100vw, 800px" srcset="/assets/images/generated/openclaw-routing-400-4fa05e2c1.webp 400w, /assets/images/generated/openclaw-routing-800-4fa05e2c1.webp 800w, /assets/images/generated/openclaw-routing-1200-4fa05e2c1.webp 1200w" type="image/webp" /><source sizes="(max-width: 600px) 100vw, (max-width: 900px) 100vw, 800px" srcset="/assets/images/generated/openclaw-routing-400-5b00b888b.png 400w, /assets/images/generated/openclaw-routing-800-5b00b888b.png 800w, /assets/images/generated/openclaw-routing-1200-5b00b888b.png 1200w" type="image/png" /><img loading="lazy" decoding="async" src="/assets/images/generated/openclaw-routing-800-5b00b888b.png" alt="Channels on the left flowing through a golden gateway orb to different agents on the right, with crossing lines showing intelligent routing" /></picture>

<h2 id="session-isolation">Session Isolation</h2>

<p>Once OpenClaw knows which agent handles a message, it needs to know which session to continue. This determines what context the agent sees.</p>

<p>OpenClaw constructs session keys from agent ID, channel, and peer. The <code class="language-plaintext highlighter-rouge">dmScope</code> setting controls how conversations group together:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">main</code> — all DMs collapse into one session (unified context)</li>
  <li><code class="language-plaintext highlighter-rouge">per-peer</code> — each contact gets their own session (isolated conversations)</li>
  <li><code class="language-plaintext highlighter-rouge">per-channel-peer</code> — same person on Telegram vs WhatsApp gets separate sessions (platform-isolated)</li>
</ul>

<p>The trade-off is context bleed vs context isolation. With <code class="language-plaintext highlighter-rouge">main</code>, your agent knows you mentioned a project to your coworker and can reference it when your boss asks — useful for a unified assistant, awkward if you wanted those conversations separate. With <code class="language-plaintext highlighter-rouge">per-peer</code>, what you discussed with one person stays in that context.</p>

<p>Neither is universally right. It depends on how you want your assistant to behave.</p>

<h2 id="multi-agent">Multi-Agent</h2>

<p>OpenClaw supports multiple agents in one gateway. Each agent has:</p>

<ul>
  <li>Its own model configuration (Opus for complex work, Sonnet for routine tasks)</li>
  <li>Its own sandbox settings (Docker container, filesystem access)</li>
  <li>Its own skills directory</li>
  <li>Its own sessions</li>
</ul>

<p>The work agent doesn’t see personal conversations. The coding agent’s filesystem access doesn’t extend to your documents. <strong>Isolation isn’t just convenience — it’s the architecture.</strong></p>

<p>A practical setup might look like:</p>

<ul>
  <li><strong>Personal agent</strong> on Opus for open-ended conversation</li>
  <li><strong>Work agent</strong> on Sonnet with company Slack integration</li>
  <li><strong>Coding agent</strong> on Sonnet with Docker sandbox and workspace access</li>
</ul>

<p>Each agent is a Pi instance with different configuration. OpenClaw routes traffic to the right one.</p>

<h2 id="provider-failover">Provider Failover</h2>

<p>Cloud providers fail. Rate limits hit. Bills exceed quotas. A production assistant can’t just stop working when Anthropic returns 429.</p>

<p>OpenClaw maintains auth profiles — ordered lists of providers with automatic rotation. When one fails, it marks that profile as in cooldown and tries the next. Your config might list:</p>

<ol>
  <li>Anthropic Claude (primary)</li>
  <li>OpenAI GPT-4 (fallback)</li>
  <li>Local Ollama (emergency)</li>
</ol>

<p>The failover is transparent. The agent doesn’t know which provider answered. The conversation continues on whatever’s available.</p>

<p>This enables cost optimization too. Route simple messages to the cheap model, complex ones to the expensive model. Use cloud providers for capability, local models for privacy. The routing layer makes these policies configurable.</p>

<h2 id="command-gating">Command Gating</h2>

<p>Not everyone who can message your agent should be able to reset its memory or switch models. OpenClaw implements command gating — certain commands require approval.</p>

<p>Gated commands include:</p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">/reset</code>, <code class="language-plaintext highlighter-rouge">/new</code> — clear session state</li>
  <li><code class="language-plaintext highlighter-rouge">/compact</code> — force context compaction</li>
  <li><code class="language-plaintext highlighter-rouge">/model</code> — switch the underlying model</li>
  <li><code class="language-plaintext highlighter-rouge">/think</code> — adjust reasoning level</li>
</ul>

<p>Who can run these commands? You configure an allowlist. For personal use, that’s just your user IDs. For team deployments, it might be specific Slack users or Discord roles. Everyone else can talk to the agent but can’t touch its configuration.</p>

<h2 id="media-staging">Media Staging</h2>

<p>When someone sends an image or file, it has to reach the agent somehow. OpenClaw stages media into the sandbox workspace before Pi runs.</p>

<p>An image sent via Telegram:</p>
<ol>
  <li>OpenClaw downloads it from Telegram’s servers</li>
  <li>Writes it to a temporary path in the sandbox workspace</li>
  <li>Includes the path in the message context</li>
  <li>Pi can read the file using its normal <code class="language-plaintext highlighter-rouge">read</code> tool</li>
</ol>

<p>This keeps Pi’s tools minimal — it doesn’t need platform-specific media APIs. <strong>It just reads files.</strong> The infrastructure handles the translation from “Telegram photo” to “file at /workspace/tmp/image-123.jpg.”</p>

<h2 id="what-happens-to-a-message">What Happens to a Message</h2>

<p>Trace a Telegram message through the system:</p>

<ol>
  <li><strong>Receive</strong> — Telegram bot API delivers the message to OpenClaw</li>
  <li><strong>Route</strong> — Binding system matches channel + peer → agent</li>
  <li><strong>Session</strong> — Session key constructed, session file located</li>
  <li><strong>Stage</strong> — Attachments staged into sandbox workspace</li>
  <li><strong>Gate</strong> — Slash commands checked against allowlist</li>
  <li><strong>Directives</strong> — Inline directives extracted (<code class="language-plaintext highlighter-rouge">/think high</code>, <code class="language-plaintext highlighter-rouge">/model gpt-4</code>)</li>
  <li><strong>Run</strong> — Pi executes with the session and message</li>
  <li><strong>Chunk</strong> — Response split to fit platform limits (Slack: 40KB)</li>
  <li><strong>Send</strong> — Response delivered back to Telegram</li>
</ol>

<p>The agent — Pi — handles step 7. Everything else is infrastructure. <strong>The infrastructure exists so the agent can be simple.</strong></p>

<picture><source sizes="(max-width: 600px) 100vw, (max-width: 900px) 100vw, 800px" srcset="/assets/images/generated/openclaw-message-trace-400-4eb4191c6.webp 400w, /assets/images/generated/openclaw-message-trace-800-4eb4191c6.webp 800w, /assets/images/generated/openclaw-message-trace-1200-4eb4191c6.webp 1200w" type="image/webp" /><source sizes="(max-width: 600px) 100vw, (max-width: 900px) 100vw, 800px" srcset="/assets/images/generated/openclaw-message-trace-400-a969773b8.png 400w, /assets/images/generated/openclaw-message-trace-800-a969773b8.png 800w, /assets/images/generated/openclaw-message-trace-1200-a969773b8.png 1200w" type="image/png" /><img loading="lazy" decoding="async" src="/assets/images/generated/openclaw-message-trace-800-a969773b8.png" alt="A path of gray stepping stones arcing left to right with one golden glowing stone near the end, representing the single agent step among many infrastructure steps" /></picture>

<h2 id="the-gateway">The Gateway</h2>

<p>All this runs in a single process: the gateway. It maintains channel connections, routes messages, manages sessions, handles hot-reload.</p>

<p>The gateway runs on your hardware. Sessions live on your disk. Credentials stay on your machine. When you turn it off, it’s off.</p>

<p>This is different from agent-as-a-service. With hosted agents, the provider holds your sessions, sees your conversations, decides when to update the model. With OpenClaw, you hold everything. The trade-off is operational overhead — you’re running infrastructure. The benefit is control.</p>

<h2 id="pi-in-production">Pi in Production</h2>

<p>Pi’s bet is that <a href="/posts/2026/02/05/four-tools-and-a-lobster/">minimal agents are better agents</a> — less bloat, more focus, clearer failures.</p>

<p>OpenClaw doesn’t contradict that bet. It validates it. Pi stays minimal. The infrastructure stays separate. When you’re debugging why the agent gave a bad response, you’re looking at Pi. When you’re debugging why a message didn’t arrive, you’re looking at OpenClaw. <strong>The concerns don’t mix.</strong></p>

<p>This is what happens when you take a minimal agent and ask “how do I use this everywhere?” <strong>The answer isn’t making the agent bigger. It’s building infrastructure around the agent.</strong></p>

<p>One brain, multiple eyes. The brain stays simple. The eyes multiply.</p>]]></content><author><name>Doug Campos</name></author><category term="ai" /><category term="agents" /><category term="infrastructure" /><category term="open-source" /><category term="pi" /><category term="openclaw" /><summary type="html"><![CDATA[Pi is a single-session agent. It doesn’t care where the conversation comes from — terminal, Slack, wherever. One session, one context. That’s the whole point of keeping it minimal. But what if you want the same brain answering you on Telegram while you’re commuting, on Slack while you’re working, and on iMessage when your laptop is closed? That’s the problem OpenClaw solves. One agent, multiplexed across every channel you use.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://random.qmx.me/assets/images/openclaw-hero.png" /><media:content medium="image" url="https://random.qmx.me/assets/images/openclaw-hero.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">The Art of Combining Opinions</title><link href="https://random.qmx.me/posts/2026/02/26/the-art-of-combining-opinions/" rel="alternate" type="text/html" title="The Art of Combining Opinions" /><published>2026-02-26T00:00:00+00:00</published><updated>2026-02-26T00:00:00+00:00</updated><id>https://random.qmx.me/posts/2026/02/26/the-art-of-combining-opinions</id><content type="html" xml:base="https://random.qmx.me/posts/2026/02/26/the-art-of-combining-opinions/"><![CDATA[<p><a href="/posts/2026/02/09/how-computers-find-words/">BM25</a> finds exact word matches. <a href="/posts/2026/02/16/how-computers-understand-meaning/">Vector search</a> finds semantic similarity. Each has blind spots the other covers.</p>

<p>The obvious next question: why not use both?</p>

<p>Modern search systems do. They run BM25 and vector search in parallel, then combine the results. But combining ranked lists is harder than it sounds. <strong>The technique that makes it work - Reciprocal Rank Fusion - is elegant enough to be worth understanding on its own.</strong></p>

<!--more-->

<h2 id="the-problem-with-scores">The Problem With Scores</h2>

<p>Let’s say you search for “authentication flow” and get these results:</p>

<p><strong>BM25 Results:</strong></p>
<ol>
  <li>meeting-notes.md (score: 12.4)</li>
  <li>auth-design.md (score: 8.7)</li>
  <li>api-spec.md (score: 6.2)</li>
</ol>

<p><strong>Vector Results:</strong></p>
<ol>
  <li>auth-design.md (score: 0.89)</li>
  <li>login-flow.md (score: 0.84)</li>
  <li>meeting-notes.md (score: 0.71)</li>
</ol>

<p>Which document is most relevant overall?</p>

<p>The naive approach is averaging scores. But that doesn’t work - BM25 scores and cosine similarity scores are completely different things. A BM25 score of 12.4 doesn’t mean the same thing as a vector score of 0.89. <strong>They’re not on the same scale, they’re not measuring the same thing, and adding them together is meaningless.</strong></p>

<p>You could try normalizing - convert both to 0-1 ranges and then combine. But normalization is tricky. What range do you normalize to? How do you handle documents that only appear in one list? The choices are arbitrary and affect results in ways that are hard to predict.</p>

<p>Reciprocal Rank Fusion sidesteps all of this by ignoring scores entirely. <strong>It only looks at ranks.</strong></p>

<picture><source sizes="(max-width: 600px) 100vw, (max-width: 900px) 100vw, 800px" srcset="/assets/images/generated/rrf-scores-vs-ranks-400-7108728a6.webp 400w, /assets/images/generated/rrf-scores-vs-ranks-800-7108728a6.webp 800w, /assets/images/generated/rrf-scores-vs-ranks-1200-7108728a6.webp 1200w" type="image/webp" /><source sizes="(max-width: 600px) 100vw, (max-width: 900px) 100vw, 800px" srcset="/assets/images/generated/rrf-scores-vs-ranks-400-935f61302.png 400w, /assets/images/generated/rrf-scores-vs-ranks-800-935f61302.png 800w, /assets/images/generated/rrf-scores-vs-ranks-1200-935f61302.png 1200w" type="image/png" /><img loading="lazy" decoding="async" src="/assets/images/generated/rrf-scores-vs-ranks-800-935f61302.png" alt="Mismatched measurement scales giving way to a clean positional podium" /></picture>

<h2 id="ranks-not-scores">Ranks, Not Scores</h2>

<p>RRF’s insight: <strong>position in a ranked list tells you something, regardless of the scoring function that produced it.</strong></p>

<p>If a document is ranked #1 by BM25, that means BM25 thinks it’s the most relevant. If the same document is ranked #2 by vector search, that means vector search thinks it’s nearly the most relevant. We don’t need to understand or compare the scores - the ranks carry the signal.</p>

<p>RRF computes a combined score based purely on positions:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>RRF(document) = sum of 1/(k + rank) for each list
</code></pre></div></div>

<p>Where <code class="language-plaintext highlighter-rouge">k</code> is a constant (typically 60) and <code class="language-plaintext highlighter-rouge">rank</code> is the document’s position in each list.</p>

<p>Let’s work through the example:</p>

<p><strong>meeting-notes.md:</strong></p>
<ul>
  <li>BM25 rank 1: 1/(60+1) = 0.0164</li>
  <li>Vector rank 3: 1/(60+3) = 0.0159</li>
  <li><strong>RRF: 0.0323</strong></li>
</ul>

<p><strong>auth-design.md:</strong></p>
<ul>
  <li>BM25 rank 2: 1/(60+2) = 0.0161</li>
  <li>Vector rank 1: 1/(60+1) = 0.0164</li>
  <li><strong>RRF: 0.0325</strong></li>
</ul>

<p><strong>api-spec.md:</strong></p>
<ul>
  <li>BM25 rank 3: 1/(60+3) = 0.0159</li>
  <li>Vector rank: not present = 0</li>
  <li><strong>RRF: 0.0159</strong></li>
</ul>

<p><strong>login-flow.md:</strong></p>
<ul>
  <li>BM25 rank: not present = 0</li>
  <li>Vector rank 2: 1/(60+2) = 0.0161</li>
  <li><strong>RRF: 0.0161</strong></li>
</ul>

<p><strong>Combined ranking:</strong></p>
<ol>
  <li>auth-design.md (0.0325)</li>
  <li>meeting-notes.md (0.0323)</li>
  <li>login-flow.md (0.0161)</li>
  <li>api-spec.md (0.0159)</li>
</ol>

<p>auth-design.md wins because it ranked highly in both lists. meeting-notes.md ranked #1 in BM25 but only #3 in vectors, so it comes second. <strong>Documents appearing in only one list score lowest.</strong></p>

<h2 id="why-k60">Why k=60?</h2>

<p>The constant k controls how much rank position matters.</p>

<p>With a small k (say, 1):</p>
<ul>
  <li>Rank 1: 1/(1+1) = 0.50</li>
  <li>Rank 2: 1/(1+2) = 0.33</li>
  <li>The gap between #1 and #2 is huge</li>
</ul>

<p>With a large k (say, 60):</p>
<ul>
  <li>Rank 1: 1/(60+1) = 0.0164</li>
  <li>Rank 2: 1/(60+2) = 0.0161</li>
  <li>The gap between #1 and #2 is small</li>
</ul>

<p>A larger k means positions matter less relative to appearing in multiple lists. Being #1 in one list isn’t much better than being #2. <strong>But appearing in both lists, even at mediocre ranks, beats appearing at the top of only one.</strong></p>

<p>The <a href="https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf">original RRF paper</a> (2009) found k=60 worked well empirically across various datasets. It’s become the standard default. The intuition: it rewards consensus across methods while still giving some credit to top positions.</p>

<h2 id="query-expansion-casting-a-wider-net">Query Expansion: Casting a Wider Net</h2>

<p>RRF combines results from different search methods. But you can go further by generating variations of the query itself.</p>

<p>Instead of just searching for “authentication flow,” you use a small language model to generate alternative phrasings. Maybe “user login process” or “identity verification steps.”</p>

<p>Then you run four searches, not two:</p>
<ol>
  <li>Original query → BM25</li>
  <li>Original query → vectors</li>
  <li>Expanded query → BM25</li>
  <li>Expanded query → vectors</li>
</ol>

<p>All four result lists feed into RRF. The original query gets weighted higher to keep it dominant, but the expansion helps catch documents using different terminology.</p>

<p>This addresses a limitation I mentioned in the <a href="/posts/2026/02/09/how-computers-find-words/">BM25 post</a>: you might not remember the exact words used. If your notes say “login” but you search for “authentication,” the expansion might generate “login” and catch those documents.</p>

<p><strong>Query expansion is essentially automated synonym matching, powered by a language model that understands how concepts can be rephrased.</strong></p>

<h2 id="why-hybrid-beats-pure">Why Hybrid Beats Pure</h2>

<p>You might wonder: if vector search understands meaning, why bother with BM25 at all?</p>

<p>Because they have different failure modes.</p>

<p><strong>BM25 fails when:</strong></p>
<ul>
  <li>You use different words than the document (“auth” vs “authentication”)</li>
  <li>You’re searching for concepts, not terms</li>
  <li>You can’t remember exact phrasing</li>
</ul>

<p><strong>Vector search fails when:</strong></p>
<ul>
  <li>You want exact matches (error messages, function names)</li>
  <li>The terms are rare or domain-specific</li>
  <li>The document is tangentially related but not actually relevant (high similarity, low relevance)</li>
</ul>

<p>Consider searching for “ECONNREFUSED timeout.” BM25 will find documents containing that exact error string. Vector search might return documents about network errors generally - related, but not specifically about ECONNREFUSED.</p>

<p>Or consider searching for “how we handle user sessions.” Vector search will find documents about session management even if they never use the word “session.” BM25 would miss them if the terminology differs.</p>

<p><strong>Hybrid search gets both.</strong> RRF ensures documents matching both methods rise to the top, while documents matching only one method still appear - just ranked lower.</p>

<h2 id="putting-it-together">Putting It Together</h2>

<p>A hybrid retrieval pipeline looks like this:</p>

<ol>
  <li><strong>Query expansion</strong>: Generate variations of the original query</li>
  <li><strong>Parallel retrieval</strong>: Run all queries against both BM25 and vector indexes</li>
  <li><strong>RRF fusion</strong>: Combine all ranked lists into one</li>
  <li><strong>Return results</strong>: Ranked by combined RRF score</li>
</ol>

<p>The whole thing can run locally. The expansion model, the embedding model, the BM25 index - all on your machine. No API calls, no network latency, no data leaving your control.</p>

<p>For most searches, this is what you want. It’s slightly slower than pure BM25 (model inference takes time), but the results are substantially better for anything beyond exact-match queries.</p>

<h2 id="rrf-beyond-search">RRF Beyond Search</h2>

<p>RRF is useful anywhere you need to combine ranked lists from different sources. Election aggregation, recommendation systems, meta-analysis of studies. The principle is the same: <strong>ranks carry information even when scores don’t compare.</strong></p>

<p>The elegance is in what RRF ignores. It doesn’t try to understand the scoring functions. It doesn’t normalize or calibrate. It just observes: this document ranked highly according to multiple independent methods. That’s a signal worth trusting.</p>

<p>It’s a reminder that sophisticated problems sometimes have simple solutions. Combining search rankings could involve complex machine learning, learned weighting schemes, elaborate normalization procedures. Or it could involve adding up reciprocals of positions. <strong>The simple approach often wins.</strong></p>

<h2 id="the-takeaway">The Takeaway</h2>

<p>BM25 and vector search answer different questions. BM25 asks “does this document contain these words?” Vector search asks “is this document about the same thing?” Hybrid search asks both.</p>

<p>RRF combines their answers without needing to understand or compare their scores. Documents that both methods like rise to the top. Documents that only one method likes still appear, just lower.</p>

<p>Query expansion catches terminology mismatches by searching for variations you didn’t think of.</p>

<p>Together, these techniques turn keyword search and semantic search from competing approaches into complementary ones. <strong>The result is search that handles both precise technical queries and vague conceptual ones - the best of both worlds.</strong></p>]]></content><author><name>Doug Campos</name></author><category term="ai" /><category term="search" /><category term="information-retrieval" /><category term="rrf" /><category term="qmd" /><summary type="html"><![CDATA[BM25 finds exact word matches. Vector search finds semantic similarity. Each has blind spots the other covers. The obvious next question: why not use both? Modern search systems do. They run BM25 and vector search in parallel, then combine the results. But combining ranked lists is harder than it sounds. The technique that makes it work - Reciprocal Rank Fusion - is elegant enough to be worth understanding on its own.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://random.qmx.me/assets/images/rrf-hero.png" /><media:content medium="image" url="https://random.qmx.me/assets/images/rrf-hero.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Sessions as Trees, Code as Clay</title><link href="https://random.qmx.me/posts/2026/02/19/sessions-as-trees/" rel="alternate" type="text/html" title="Sessions as Trees, Code as Clay" /><published>2026-02-19T00:00:00+00:00</published><updated>2026-02-19T00:00:00+00:00</updated><id>https://random.qmx.me/posts/2026/02/19/sessions-as-trees</id><content type="html" xml:base="https://random.qmx.me/posts/2026/02/19/sessions-as-trees/"><![CDATA[<p>In <a href="/posts/2026/02/12/skills-that-write-themselves/">my last post</a>, I described <a href="https://github.com/badlogic/pi-mono/tree/main/packages/mom"><code class="language-plaintext highlighter-rouge">mom</code></a> — Mario’s self-managing Slack bot that installs its own tools and writes its own skills. But there’s a question I glossed over: what happens when a self-written skill breaks?</p>

<p>If you’re extending an agent at runtime, mistakes are inevitable. A skill with a bug. A command that hangs. A change that corrupts state. Traditional agents either crash or carry corrupted context forward. <a href="/posts/2026/02/04/four-tools-and-a-lobster/">Pi</a> does something different.</p>

<p><strong>Sessions aren’t logs. They’re trees.</strong></p>

<!--more-->

<h2 id="the-problem-with-logs">The Problem with Logs</h2>

<p>Most chat systems store conversation as a flat log. Message 1, message 2, message 3, appended forever. When you need to recover from a mistake, your options are limited: clear the whole session, or manually edit the log file and hope you don’t corrupt it.</p>

<p><strong>Flat logs assume conversations are linear. But agent workflows aren’t linear.</strong> You try something, it fails, you backtrack. You explore a tangent, learn something, return to the main thread. You want to test a risky operation without committing to it.</p>

<p><a href="https://github.com/badlogic/pi-mono/">Pi’s <code class="language-plaintext highlighter-rouge">SessionManager</code></a> stores conversations as trees.</p>

<h2 id="tree-structure">Tree Structure</h2>

<p>Pi sessions are JSONL files where every entry has an <code class="language-plaintext highlighter-rouge">id</code> and <code class="language-plaintext highlighter-rouge">parentId</code>:</p>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="nl">"type"</span><span class="p">:</span><span class="s2">"session"</span><span class="p">,</span><span class="nl">"version"</span><span class="p">:</span><span class="mi">3</span><span class="p">,</span><span class="nl">"id"</span><span class="p">:</span><span class="s2">"abc123"</span><span class="p">,</span><span class="nl">"cwd"</span><span class="p">:</span><span class="s2">"/workspace"</span><span class="p">}</span><span class="w">
</span><span class="p">{</span><span class="nl">"id"</span><span class="p">:</span><span class="s2">"e1"</span><span class="p">,</span><span class="nl">"parentId"</span><span class="p">:</span><span class="kc">null</span><span class="p">,</span><span class="nl">"type"</span><span class="p">:</span><span class="s2">"message"</span><span class="p">,</span><span class="nl">"role"</span><span class="p">:</span><span class="s2">"user"</span><span class="p">,</span><span class="nl">"content"</span><span class="p">:</span><span class="s2">"..."</span><span class="p">}</span><span class="w">
</span><span class="p">{</span><span class="nl">"id"</span><span class="p">:</span><span class="s2">"e2"</span><span class="p">,</span><span class="nl">"parentId"</span><span class="p">:</span><span class="s2">"e1"</span><span class="p">,</span><span class="nl">"type"</span><span class="p">:</span><span class="s2">"message"</span><span class="p">,</span><span class="nl">"role"</span><span class="p">:</span><span class="s2">"assistant"</span><span class="p">,</span><span class="nl">"content"</span><span class="p">:</span><span class="s2">"..."</span><span class="p">}</span><span class="w">
</span><span class="p">{</span><span class="nl">"id"</span><span class="p">:</span><span class="s2">"e3"</span><span class="p">,</span><span class="nl">"parentId"</span><span class="p">:</span><span class="s2">"e2"</span><span class="p">,</span><span class="nl">"type"</span><span class="p">:</span><span class="s2">"message"</span><span class="p">,</span><span class="nl">"role"</span><span class="p">:</span><span class="s2">"user"</span><span class="p">,</span><span class="nl">"content"</span><span class="p">:</span><span class="s2">"..."</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">parentId</code> links create a tree. Most of the time, the tree is just a linear chain — normal conversation. But when you branch, the tree structure enables recovery.</p>

<h2 id="branching">Branching</h2>

<p>Pi’s <code class="language-plaintext highlighter-rouge">SessionManager</code> exposes <code class="language-plaintext highlighter-rouge">createBranchedSession(leafId)</code>. You pick any point in the conversation history, and Pi forks a new branch from there.</p>

<p>The branch inherits everything up to the fork point. After that, it diverges. You can experiment in the branch — test a dangerous command, try a different approach, debug a broken skill. The main session remains untouched.</p>

<p>When the branch work is done, you have options:</p>

<ul>
  <li><strong>Discard</strong> — the experiment failed, throw away the branch</li>
  <li><strong>Summarize</strong> — extract learnings into a summary, apply to main session</li>
  <li><strong>Continue</strong> — the branch becomes the new main line</li>
</ul>

<p>Branch summaries get persisted as a special entry type:</p>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="nl">"id"</span><span class="p">:</span><span class="s2">"bs1"</span><span class="p">,</span><span class="nl">"parentId"</span><span class="p">:</span><span class="s2">"e47"</span><span class="p">,</span><span class="nl">"type"</span><span class="p">:</span><span class="s2">"branch_summary"</span><span class="p">,</span><span class="nl">"summary"</span><span class="p">:</span><span class="s2">"Debugged deploy skill: config template had unescaped quotes, added YAML validation step before write"</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<p>The summary is dense — just the learnings, not the debugging conversation. When you return to the main session, the agent sees the summary without the noise of trial and error.</p>

<p>This is what <a href="/posts/2025/12/08/the-dog-ate-my-ai-generated-code/">I called “intentional compaction”</a> but with actual architecture supporting it. You’re not manually copying context between sessions. The tree structure makes branching a first-class operation.</p>

<h2 id="compaction-as-tree-pruning">Compaction as Tree Pruning</h2>

<p>Context windows fill up. When they do, you’re in <a href="/posts/2026/01/14/escaping-the-dumb-zone-with-rlms/">the dumb zone</a> — the model drifts and forgets instructions. You need to drop old content. Most agents just truncate — keep the last N messages, drop everything else.</p>

<p>Pi’s compaction is smarter. It summarizes older conversation into a persistent <code class="language-plaintext highlighter-rouge">compaction</code> entry:</p>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="nl">"id"</span><span class="p">:</span><span class="s2">"c1"</span><span class="p">,</span><span class="nl">"parentId"</span><span class="p">:</span><span class="s2">"e50"</span><span class="p">,</span><span class="nl">"type"</span><span class="p">:</span><span class="s2">"compaction"</span><span class="p">,</span><span class="nl">"summary"</span><span class="p">:</span><span class="s2">"..."</span><span class="p">,</span><span class="nl">"firstKeptEntryId"</span><span class="p">:</span><span class="s2">"e51"</span><span class="p">,</span><span class="nl">"tokensBefore"</span><span class="p">:</span><span class="mi">45000</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<p>The summary isn’t just dropped context — it’s compressed knowledge that remains in the tree. Future turns see the compaction summary plus messages after <code class="language-plaintext highlighter-rouge">firstKeptEntryId</code>. The information is preserved, just in denser form — <a href="/posts/2026/01/12/compression-is-not-enough-the-journey-matters/">compression as journey, not just mechanism</a>.</p>

<p>Auto-compaction triggers when:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>contextTokens &gt; contextWindow - reserveTokens
</code></pre></div></div>

<p>The settings are configurable:</p>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
  </span><span class="nl">"compaction"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nl">"enabled"</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="p">,</span><span class="w">
    </span><span class="nl">"reserveTokens"</span><span class="p">:</span><span class="w"> </span><span class="mi">20000</span><span class="p">,</span><span class="w">
    </span><span class="nl">"keepRecentTokens"</span><span class="p">:</span><span class="w"> </span><span class="mi">20000</span><span class="w">
  </span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">reserveTokens</code> is headroom for prompts and the next response. <code class="language-plaintext highlighter-rouge">keepRecentTokens</code> controls how much recent context survives compaction. OpenClaw enforces a floor of 20,000 tokens for <code class="language-plaintext highlighter-rouge">reserveTokens</code> — enough room for multi-turn housekeeping before compaction becomes unavoidable. Set it lower and OpenClaw bumps it up.</p>

<h2 id="pre-compaction-memory-flush">Pre-Compaction Memory Flush</h2>

<p>OpenClaw adds a safety net: before compaction triggers, the system runs a silent agentic turn — tackling <a href="/posts/2026/01/04/on-beads-bloat-and-breaking-points/">the same persistent memory problem</a> that Beads tried to solve with dependency graphs, but with a simpler approach.</p>

<p>The flush runs when context crosses a “soft threshold” — below Pi’s compaction threshold but close enough to warrant concern. The system injects a message asking the agent to persist critical state to disk. The agent writes to <code class="language-plaintext highlighter-rouge">memory/YYYY-MM-DD.md</code> in the workspace.</p>

<p>The turn is silent — <code class="language-plaintext highlighter-rouge">NO_REPLY</code> at the start suppresses delivery to the user. You don’t see the housekeeping. But when compaction runs, the durable state has already been saved.</p>

<p>Configuration:</p>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
  </span><span class="nl">"compaction"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nl">"memoryFlush"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
      </span><span class="nl">"enabled"</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="p">,</span><span class="w">
      </span><span class="nl">"softThresholdTokens"</span><span class="p">:</span><span class="w"> </span><span class="mi">4000</span><span class="w">
    </span><span class="p">}</span><span class="w">
  </span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<p>The flush runs once per compaction cycle. If the session stays below threshold, no flush. If it crosses threshold multiple times, one flush per cycle. The workspace file survives compaction — you don’t lose critical context just because the window filled up.</p>

<h2 id="hot-reload">Hot-Reload</h2>

<p>Trees make self-extension recoverable. But OpenClaw adds another layer: hot-reload for the infrastructure around Pi.</p>

<p>OpenClaw runs two watch systems in parallel:</p>

<p><strong>Skills watcher</strong> — monitors <code class="language-plaintext highlighter-rouge">skills/</code> directories using <a href="https://github.com/paulmillr/chokidar">chokidar</a> (a cross-platform file watcher), debounces changes (250ms default), bumps snapshot versions when skills change. When you edit a skill, it’s available immediately. No restart.</p>

<p><strong>Config watcher</strong> — rule-based reload for configuration. Some changes can hot-reload; others require gateway restart:</p>

<ul>
  <li>Hooks and cron jobs hot-reload instantly</li>
  <li>Browser control settings hot-reload</li>
  <li>Gateway and plugin changes require restart</li>
  <li>Skills have their own dedicated watcher</li>
</ul>

<p>This separation prevents unsafe reloads while enabling fast iteration on safe changes.</p>

<h2 id="recovery-workflow">Recovery Workflow</h2>

<p>Here’s a concrete scenario. You’re working with <code class="language-plaintext highlighter-rouge">mom</code> on a deployment task. <code class="language-plaintext highlighter-rouge">mom</code> writes a new <code class="language-plaintext highlighter-rouge">deploy</code> skill to automate your release process. You invoke it, and the skill has a bug — it writes corrupted config to your staging environment.</p>

<p>With a flat-log agent, your options are bad: start over and lose all context, or try to have the agent fix its own mess while carrying the corrupted state forward. Both paths hurt.</p>

<p>With Pi’s tree structure, you branch from before the damage:</p>

<ol>
  <li><code class="language-plaintext highlighter-rouge">/branch e47</code> — fork from the message before the deploy</li>
  <li>Debug in the branch — the main session stays untouched</li>
  <li>Fix the skill, test it, verify it works</li>
  <li><code class="language-plaintext highlighter-rouge">/summary</code> — extract the learnings into a dense summary</li>
  <li>Return to main — the summary persists, the debugging noise doesn’t</li>
</ol>

<p>The skill is now fixed and hot-reloaded (no restart needed). You can return to main and run the deploy again. The branch gave you a safe space to recover without contaminating your primary context.</p>

<p><strong>Trees give you actual recovery.</strong> Flat logs give you “start over” or “hope for the best.”</p>

<h2 id="the-jsonl-format">The JSONL Format</h2>

<p>Pi’s choice of JSONL is deliberate. It’s append-only, which means writes are atomic — no risk of corrupting the file mid-write. It’s human-readable, so you can inspect sessions with standard tools. It’s line-based, so you can grep for specific content.</p>

<p>The tree structure lives in the <code class="language-plaintext highlighter-rouge">parentId</code> links, not in nested JSON. This means you can still process the file line-by-line. Tools that don’t understand trees just see a flat log. Tools that do understand trees can reconstruct the full structure.</p>

<h2 id="what-this-enables">What This Enables</h2>

<p>Tree-structured sessions enable workflows that flat logs can’t support:</p>

<p><strong>Parallel exploration.</strong> Fork multiple branches to try different approaches simultaneously. Compare results, pick the best one.</p>

<p><strong>Safe experimentation.</strong> Test risky operations in branches. If they work, merge. If they fail, discard.</p>

<p><strong>Debugging isolation.</strong> When something breaks, branch to debug without polluting the main context with error messages and failed attempts.</p>

<p><strong>Recoverable self-extension.</strong> Agents can extend themselves knowing that mistakes are recoverable. This is what makes <code class="language-plaintext highlighter-rouge">mom</code>’s self-writing skills viable in practice.</p>

<h2 id="the-architecture">The Architecture</h2>

<p><a href="/posts/2026/02/04/four-tools-and-a-lobster/">Malleable code</a> — agents that write and modify their own capabilities — requires architecture that makes mistakes recoverable. Pi’s session trees provide that architecture.</p>

<p><strong>Flat logs assume perfect execution. Trees assume failure is normal and recovery is essential.</strong></p>

<p>For self-extending agents, the second assumption is obviously correct. And once you have trees, you stop fearing experimentation. Branch, try something risky, recover if it fails. The architecture makes courage cheap.</p>]]></content><author><name>Doug Campos</name></author><category term="ai" /><category term="agents" /><category term="sessions" /><category term="architecture" /><category term="pi" /><summary type="html"><![CDATA[In my last post, I described mom — Mario’s self-managing Slack bot that installs its own tools and writes its own skills. But there’s a question I glossed over: what happens when a self-written skill breaks? If you’re extending an agent at runtime, mistakes are inevitable. A skill with a bug. A command that hangs. A change that corrupts state. Traditional agents either crash or carry corrupted context forward. Pi does something different. Sessions aren’t logs. They’re trees.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://random.qmx.me/assets/images/sessions-hero.png" /><media:content medium="image" url="https://random.qmx.me/assets/images/sessions-hero.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">How Computers Understand Meaning</title><link href="https://random.qmx.me/posts/2026/02/16/how-computers-understand-meaning/" rel="alternate" type="text/html" title="How Computers Understand Meaning" /><published>2026-02-16T00:00:00+00:00</published><updated>2026-02-16T00:00:00+00:00</updated><id>https://random.qmx.me/posts/2026/02/16/how-computers-understand-meaning</id><content type="html" xml:base="https://random.qmx.me/posts/2026/02/16/how-computers-understand-meaning/"><![CDATA[<p><a href="/posts/2026/02/09/how-computers-find-words/">BM25</a> is fast, reliable, and completely fails when you can’t remember the exact words you’re looking for.</p>

<p>Search for “authentication” and BM25 won’t find documents that say “login.” To BM25, those are different strings. It has no concept that they mean related things.</p>

<p><strong>Vector search fixes this.</strong> It finds documents by meaning, not just by words.</p>

<!--more-->

<h2 id="meaning-as-coordinates">Meaning as Coordinates</h2>

<p>Here’s the core idea: what if we could place words in space, where similar meanings are close together?</p>

<p>Imagine a map where “dog” and “puppy” are neighbors, “cat” and “kitten” are nearby, and “refrigerator” is off in a distant corner. If you could build such a map, then searching for “puppy” would naturally find documents about “dogs” - they’re in the same neighborhood.</p>

<p>This is what embeddings do. <strong>They convert text into coordinates</strong> - lists of numbers that represent position in a high-dimensional space. Similar meanings end up with similar coordinates.</p>

<p>The dimensions aren’t things you can name, like “animal-ness” or “size.” They’re abstract features learned from patterns in massive amounts of text. But the result is intuitive: words that appear in similar contexts end up near each other.</p>

<p>“Coffee” and “tea” appear in similar sentences - people drink them in the morning, they’re served hot, they’re caffeinated. So their embeddings are close. “Coffee” and “democracy” rarely share context, so they’re far apart.</p>

<h2 id="how-embeddings-get-made">How Embeddings Get Made</h2>

<p>An embedding model reads text and outputs a vector - a list of, say, 768 numbers. Those numbers are the coordinates.</p>

<p>The model learns these representations during training. It sees billions of sentences and learns to predict words from context. In the process, it develops internal representations where similar things cluster together.</p>

<p>You don’t train these models yourself. You use pre-trained ones. Modern embedding models like Gemma or E5 are small enough to run locally - a few hundred megabytes. Feed them a sentence, get back coordinates.</p>

<p><strong>The key insight: if you embed both your documents and your search query using the same model, you can find documents by asking “which document coordinates are closest to my query coordinates?”</strong></p>

<h2 id="cosine-similarity-which-way-are-you-pointing">Cosine Similarity: Which Way Are You Pointing?</h2>

<p>Once you have coordinates, you need a way to measure closeness. The standard approach is cosine similarity.</p>

<p>Think of each vector as an arrow pointing from the origin to its coordinates. Cosine similarity measures the angle between two arrows. If they point in the same direction, similarity is 1. If they’re perpendicular, it’s 0. If they point opposite ways, it’s -1.</p>

<p>Why angles instead of distances? Because it handles magnitude differences gracefully. A long document and a short document about the same topic will have vectors pointing the same direction, even if one vector is “longer” (has larger coordinate values). <strong>The direction captures meaning; the length is mostly noise.</strong></p>

<picture><source sizes="(max-width: 600px) 100vw, (max-width: 900px) 100vw, 800px" srcset="/assets/images/generated/vectors-cosine-400-cbc66aa1d.webp 400w, /assets/images/generated/vectors-cosine-800-cbc66aa1d.webp 800w, /assets/images/generated/vectors-cosine-1200-cbc66aa1d.webp 1200w" type="image/webp" /><source sizes="(max-width: 600px) 100vw, (max-width: 900px) 100vw, 800px" srcset="/assets/images/generated/vectors-cosine-400-50ff32852.png 400w, /assets/images/generated/vectors-cosine-800-50ff32852.png 800w, /assets/images/generated/vectors-cosine-1200-50ff32852.png 1200w" type="image/png" /><img loading="lazy" decoding="async" src="/assets/images/generated/vectors-cosine-800-50ff32852.png" alt="Three pairs of arrows showing same direction, perpendicular, and opposite directions" /></picture>

<p>When you search “how users prove their identity,” vector search:</p>

<ol>
  <li>Embeds your query into a vector</li>
  <li>Compares that vector against all stored document vectors</li>
  <li>Returns documents with the highest cosine similarity</li>
</ol>

<p>Documents about authentication, login, credentials, and identity verification all end up near each other in the embedding space. Your query about “proving identity” lands in the same neighborhood, so they match - even though the words are different.</p>

<h2 id="the-problem-with-big-documents">The Problem With Big Documents</h2>

<p>There’s a catch. Embedding models have context windows - limits on how much text they can process at once. A typical embedding model might handle 512 or 2048 tokens (roughly words).</p>

<p>Your documents are often longer. A meeting transcript might be 10,000 words. A technical spec might be 5,000. You can’t just feed the whole thing into the embedding model.</p>

<p>But what if you could? Would you even want to? A single embedding for a 10,000-word document would be a blurry average of everything in it. Search for “authentication” and you’d match a document that mentions authentication once in a sea of unrelated content - because the embedding represents the whole document, not the relevant part.</p>

<p><strong>The solution is chunking: splitting documents into smaller pieces, each with its own embedding.</strong></p>

<h2 id="chunking-the-art-of-slicing-text">Chunking: The Art of Slicing Text</h2>

<p>A common approach chunks documents into pieces of about 800 tokens each. That’s roughly 600 words, or about a page of text.</p>

<p>Why 800? It’s a tradeoff:</p>

<ul>
  <li><strong>Too small</strong> (100 tokens): You lose context. A chunk about “it” doesn’t tell you what “it” refers to.</li>
  <li><strong>Too large</strong> (4000 tokens): You’re back to blurry averages. Specific topics get diluted.</li>
  <li><strong>800 tokens</strong>: Big enough to contain a coherent idea, small enough to be specific.</li>
</ul>

<p>But there’s a problem with slicing text: you might cut right through an important passage. If a paragraph about authentication gets split between two chunks, neither chunk has the complete picture.</p>

<p>The fix is overlap. Each chunk shares some content (typically 10-20%) with the next chunk. The end of chunk 1 overlaps with the beginning of chunk 2.</p>

<p>This means text near chunk boundaries appears in two chunks. If your search matches that text, you’ll find it - it won’t fall through the cracks.</p>

<p><strong>More overlap means better boundary coverage but more storage and slower indexing. Less overlap means faster indexing but more risk of missing boundary content.</strong> 15% is a reasonable middle ground.</p>

<picture><source sizes="(max-width: 600px) 100vw, (max-width: 900px) 100vw, 800px" srcset="/assets/images/generated/vectors-chunking-400-f419e746b.webp 400w, /assets/images/generated/vectors-chunking-800-f419e746b.webp 800w, /assets/images/generated/vectors-chunking-1200-f419e746b.webp 1200w" type="image/webp" /><source sizes="(max-width: 600px) 100vw, (max-width: 900px) 100vw, 800px" srcset="/assets/images/generated/vectors-chunking-400-ff31b000d.png 400w, /assets/images/generated/vectors-chunking-800-ff31b000d.png 800w, /assets/images/generated/vectors-chunking-1200-ff31b000d.png 1200w" type="image/png" /><img loading="lazy" decoding="async" src="/assets/images/generated/vectors-chunking-800-ff31b000d.png" alt="Long document strip sliced into four overlapping colored segments" /></picture>

<h2 id="what-gets-stored">What Gets Stored</h2>

<p>When you index documents for vector search:</p>

<ol>
  <li>Each document gets split into chunks with overlap</li>
  <li>Each chunk gets embedded into a vector (768 or so numbers)</li>
  <li>Vectors get stored in an index alongside the original text</li>
</ol>

<p>A 5,000-word document might become 8-10 chunks, each with its own embedding. When you search, the system checks all chunks and returns the ones whose vectors are closest to your query.</p>

<p>The results show which chunk matched, not just which document. This is useful - you see the specific passage that’s relevant, not just “somewhere in this giant file.”</p>

<h2 id="where-vector-search-shines">Where Vector Search Shines</h2>

<p>Vector search excels at conceptual queries:</p>

<ul>
  <li>“How users prove their identity” finds authentication docs even if they say “login”</li>
  <li>“Making containers talk to each other” finds networking docs regardless of terminology</li>
  <li>“Why is the build failing” matches troubleshooting guides phrased differently</li>
</ul>

<p>It’s also robust to phrasing. “User authentication flow,” “how login works,” and “verifying user identity” all land in similar regions of embedding space. <strong>Vector search finds the same documents regardless of how you phrase the question.</strong></p>

<h2 id="where-vector-search-struggles">Where Vector Search Struggles</h2>

<p>Vector search isn’t perfect. It has blind spots that BM25 handles better:</p>

<p><strong>Exact terms</strong>: Search for “ECONNREFUSED” and vector search might return documents about network errors in general. BM25 would find the exact error message.</p>

<p><strong>Rare technical terms</strong>: Embedding models are trained on common text. Obscure jargon, internal code names, or domain-specific terminology might not be well-represented in the embedding space.</p>

<p><strong>Precision vs recall</strong>: Vector search casts a wide net. Sometimes you want exactly what you typed, not semantically related content.</p>

<p>This is why the best search systems offer both approaches. BM25 when you know the exact terms. Vectors when you’re exploring concepts. And increasingly, hybrids that combine both signals.</p>

<h2 id="the-local-advantage">The Local Advantage</h2>

<p>I’ve <a href="/posts/2025/10/08/small-models-big-future/">written before</a> about the advantages of local models. For search, the benefits are clear:</p>

<ul>
  <li><strong>Privacy</strong>: Your documents never touch external servers</li>
  <li><strong>Speed</strong>: No network latency, no rate limits</li>
  <li><strong>Cost</strong>: After setup, every query is free</li>
  <li><strong>Reliability</strong>: Works offline, no service dependencies</li>
</ul>

<p>Modern embedding models are small enough to run on modest hardware. They download once and cache locally. <strong>The tradeoff is that they’re less powerful than massive cloud models - but for document search, they’re more than sufficient.</strong></p>

<h2 id="the-takeaway">The Takeaway</h2>

<p>Embeddings convert meaning into coordinates. Similar meanings cluster together. Cosine similarity measures how close two meanings are.</p>

<p>Chunking handles long documents by splitting them into pieces small enough to embed meaningfully, with overlap to avoid losing content at boundaries.</p>

<p>Together, these techniques let you search by concept rather than by keyword. “Authentication,” “login,” and “proving identity” all find the same documents - because they mean the same thing, even though the words are different.</p>

<p>BM25 asks: “Does this document contain these words?”</p>

<p>Vector search asks: “Is this document about the same thing as my query?”</p>

<p><strong>Both questions are useful. The best search systems answer both.</strong></p>]]></content><author><name>Doug Campos</name></author><category term="ai" /><category term="search" /><category term="embeddings" /><category term="vectors" /><category term="qmd" /><summary type="html"><![CDATA[BM25 is fast, reliable, and completely fails when you can’t remember the exact words you’re looking for. Search for “authentication” and BM25 won’t find documents that say “login.” To BM25, those are different strings. It has no concept that they mean related things. Vector search fixes this. It finds documents by meaning, not just by words.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://random.qmx.me/assets/images/vectors-hero.png" /><media:content medium="image" url="https://random.qmx.me/assets/images/vectors-hero.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Skills That Write Themselves</title><link href="https://random.qmx.me/posts/2026/02/12/skills-that-write-themselves/" rel="alternate" type="text/html" title="Skills That Write Themselves" /><published>2026-02-12T00:00:00+00:00</published><updated>2026-02-12T00:00:00+00:00</updated><id>https://random.qmx.me/posts/2026/02/12/skills-that-write-themselves</id><content type="html" xml:base="https://random.qmx.me/posts/2026/02/12/skills-that-write-themselves/"><![CDATA[<p><a href="/posts/2026/02/05/four-tools-and-a-lobster/">Last week</a> I wrote about Pi’s four-tool constraint and <a href="https://mariozechner.at/">Mario Zechner</a>’s philosophy of radical minimalism. But here’s the obvious question: how do you do anything useful with just <code class="language-plaintext highlighter-rouge">read</code>, <code class="language-plaintext highlighter-rouge">write</code>, <code class="language-plaintext highlighter-rouge">edit</code>, and <code class="language-plaintext highlighter-rouge">bash</code>?</p>

<p>The answer is <strong><a href="https://github.com/badlogic/pi-mono/tree/main/packages/mom"><code class="language-plaintext highlighter-rouge">mom</code></a></strong> — short for “Master Of Mischief.” It’s a Slack bot Mario built on Pi that does something I haven’t seen elsewhere: <strong>it manages itself.</strong></p>

<!--more-->

<h2 id="the-setup-problem">The Setup Problem</h2>

<p>Traditional AI agents require you to configure everything upfront. Install dependencies. Set up credentials. Define tools. Configure integrations. The agent is powerful, but only after you’ve done the work to make it powerful.</p>

<p><code class="language-plaintext highlighter-rouge">mom</code> flips this. You give it Slack access and some API keys. It handles the rest.</p>

<p>Need git? <code class="language-plaintext highlighter-rouge">mom</code> runs <code class="language-plaintext highlighter-rouge">apk add git</code> inside its Docker container. Need to call an API? <code class="language-plaintext highlighter-rouge">mom</code> asks for the credentials, stores them appropriately, and writes a CLI wrapper. Need a capability that doesn’t exist? <code class="language-plaintext highlighter-rouge">mom</code> writes a skill for it.</p>

<p>The agent becomes responsible for its own environment.</p>

<h2 id="the-workspace">The Workspace</h2>

<p><code class="language-plaintext highlighter-rouge">mom</code>’s self-management works because of a carefully designed workspace structure. Since <code class="language-plaintext highlighter-rouge">mom</code> is a Slack bot, the hierarchy maps to Slack channels — but the pattern works for any multi-context setup:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>./data/
├── MEMORY.md              # Global context across all Slack channels
├── settings.json          # Configuration
├── skills/                # Reusable tools (global)
├── C123ABC/               # Per-channel directory (Slack channel ID)
│   ├── MEMORY.md          # Channel-specific memory
│   ├── log.jsonl          # Message history (source of truth)
│   ├── context.jsonl      # What the LLM sees
│   ├── attachments/
│   ├── scratch/
│   └── skills/            # Channel-specific tools
</code></pre></div></div>

<p>Two things matter here.</p>

<p>First, the <strong>two-file memory system</strong>. There’s a global <code class="language-plaintext highlighter-rouge">MEMORY.md</code> that persists across all channels, and a channel-specific <code class="language-plaintext highlighter-rouge">MEMORY.md</code> for local context. <code class="language-plaintext highlighter-rouge">mom</code> reads both before every response and can update them on request. This isn’t RAG or vector search — it’s just markdown files the agent knows to check.</p>

<p>Second, the <strong>skill hierarchy</strong>. Skills in <code class="language-plaintext highlighter-rouge">/workspace/skills/</code> are global. Skills in <code class="language-plaintext highlighter-rouge">/workspace/{channelId}/skills/</code> are channel-specific and override global ones. <code class="language-plaintext highlighter-rouge">mom</code> discovers available skills by reading <code class="language-plaintext highlighter-rouge">SKILL.md</code> files from both locations.</p>

<picture><source sizes="(max-width: 600px) 100vw, (max-width: 900px) 100vw, 800px" srcset="/assets/images/generated/skills-workspace-hierarchy-400-c8888b6fe.webp 400w, /assets/images/generated/skills-workspace-hierarchy-800-c8888b6fe.webp 800w, /assets/images/generated/skills-workspace-hierarchy-1200-c8888b6fe.webp 1200w" type="image/webp" /><source sizes="(max-width: 600px) 100vw, (max-width: 900px) 100vw, 800px" srcset="/assets/images/generated/skills-workspace-hierarchy-400-264c0f103.png 400w, /assets/images/generated/skills-workspace-hierarchy-800-264c0f103.png 800w, /assets/images/generated/skills-workspace-hierarchy-1200-264c0f103.png 1200w" type="image/png" /><img loading="lazy" decoding="async" src="/assets/images/generated/skills-workspace-hierarchy-800-264c0f103.png" alt="A gentle tree structure with a warm gold root node branching down to three dusty blue child nodes, each with notebook and folder icons, one child glowing to suggest it overrides the parent" /></picture>

<h2 id="self-installing-tools">Self-Installing Tools</h2>

<p><code class="language-plaintext highlighter-rouge">mom</code> runs in Docker with only the data directory mounted. <strong>The container mounts only the data directory</strong> — the host filesystem is invisible. The container starts minimal — Alpine Linux with almost nothing installed.</p>

<p>When <code class="language-plaintext highlighter-rouge">mom</code> needs a tool that isn’t there, it installs it:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>apk add git jq curl
</code></pre></div></div>

<p>The system prompt instructs <code class="language-plaintext highlighter-rouge">mom</code> to determine the appropriate package manager based on the container OS. Alpine uses <code class="language-plaintext highlighter-rouge">apk</code>. Debian uses <code class="language-plaintext highlighter-rouge">apt</code>. The agent figures it out.</p>

<p>Changes persist across sessions because the container stays running. <code class="language-plaintext highlighter-rouge">mom</code> installs <code class="language-plaintext highlighter-rouge">git</code> once, and it’s there for every future conversation. If the container gets recreated, <code class="language-plaintext highlighter-rouge">mom</code> has logged what it installed to <code class="language-plaintext highlighter-rouge">SYSTEM.md</code> and can restore the environment.</p>

<h2 id="self-creating-skills">Self-Creating Skills</h2>

<p>Here’s where it gets interesting. <code class="language-plaintext highlighter-rouge">mom</code> doesn’t just use skills — it creates them.</p>

<p>A skill is a directory containing a <code class="language-plaintext highlighter-rouge">SKILL.md</code> file and optional scripts:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/workspace/skills/note/
├── SKILL.md
└── note.sh
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">SKILL.md</code> has YAML frontmatter defining metadata:</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nn">---</span>
<span class="na">name</span><span class="pi">:</span> <span class="s">note</span>
<span class="na">description</span><span class="pi">:</span> <span class="s">Create and manage notes in the workspace</span>
<span class="nn">---</span>

<span class="c1"># Note Skill</span>

<span class="na">Usage</span><span class="pi">:</span> <span class="err">`</span><span class="s">./note.sh &lt;action&gt; [args]`</span>

<span class="na">Actions</span><span class="pi">:</span>
<span class="pi">-</span> <span class="err">`</span><span class="s">add &lt;text&gt;` - Add a new note</span>
<span class="pi">-</span> <span class="err">`</span><span class="s">list` - List all notes</span>
<span class="pi">-</span> <span class="err">`</span><span class="s">search &lt;query&gt;` - Search notes</span>
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">mom</code> discovers skills by scanning for <code class="language-plaintext highlighter-rouge">SKILL.md</code> files. The frontmatter goes into context (~100 tokens per skill). The full documentation only loads when <code class="language-plaintext highlighter-rouge">mom</code> decides to use the skill.</p>

<p>When <code class="language-plaintext highlighter-rouge">mom</code> encounters a task it can’t handle with existing skills, it can write a new one. Create the directory, write the <code class="language-plaintext highlighter-rouge">SKILL.md</code>, write the script, and the skill is immediately available. No restart, no configuration, no approval flow.</p>

<picture><source sizes="(max-width: 600px) 100vw, (max-width: 900px) 100vw, 800px" srcset="/assets/images/generated/skills-feedback-loop-400-78c5fa511.webp 400w, /assets/images/generated/skills-feedback-loop-800-78c5fa511.webp 800w, /assets/images/generated/skills-feedback-loop-1200-78c5fa511.webp 1200w" type="image/webp" /><source sizes="(max-width: 600px) 100vw, (max-width: 900px) 100vw, 800px" srcset="/assets/images/generated/skills-feedback-loop-400-5a0c5f5a7.png 400w, /assets/images/generated/skills-feedback-loop-800-5a0c5f5a7.png 800w, /assets/images/generated/skills-feedback-loop-1200-5a0c5f5a7.png 1200w" type="image/png" /><img loading="lazy" decoding="async" src="/assets/images/generated/skills-feedback-loop-800-5a0c5f5a7.png" alt="Three-stage circular flow: a robot discovers a gap on an empty shelf, then writes a new skill document, then confidently uses the new tool with the shelf now filled" /></picture>

<h2 id="the-context-sync">The Context Sync</h2>

<p><code class="language-plaintext highlighter-rouge">mom</code> maintains two JSONL files per channel. JSONL is “JSON Lines” — one JSON object per line, easily appendable and greppable. Each line is a complete message or event.</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">log.jsonl</code> — the source of truth, containing every message</li>
  <li><code class="language-plaintext highlighter-rouge">context.jsonl</code> — what the LLM actually sees, synced from <code class="language-plaintext highlighter-rouge">log.jsonl</code> before each response</li>
</ul>

<p>This separation enables something clever. When context fills up, <code class="language-plaintext highlighter-rouge">mom</code> compacts <code class="language-plaintext highlighter-rouge">context.jsonl</code> by summarizing older messages. But <code class="language-plaintext highlighter-rouge">log.jsonl</code> remains complete. If <code class="language-plaintext highlighter-rouge">mom</code> needs to recall something from before the compaction, it can grep <code class="language-plaintext highlighter-rouge">log.jsonl</code> directly.</p>

<p>This is <a href="/posts/2025/12/08/the-dog-ate-my-ai-generated-code/">what I was describing</a> as intentional compaction — but built into the architecture rather than requiring manual intervention.</p>

<h2 id="the-docker-model">The Docker Model</h2>

<p><code class="language-plaintext highlighter-rouge">mom</code>’s isolation model is simple: the container mounts only the data directory.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>docker run <span class="nt">-d</span> <span class="nt">--name</span> mom-sandbox <span class="se">\</span>
  <span class="nt">-v</span> <span class="si">$(</span><span class="nb">pwd</span><span class="si">)</span>/data:/workspace <span class="se">\</span>
  alpine:latest <span class="nb">tail</span> <span class="nt">-f</span> /dev/null

mom <span class="nt">--sandbox</span><span class="o">=</span>docker:mom-sandbox ./data
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">mom</code> executes commands inside the container via <code class="language-plaintext highlighter-rouge">docker exec</code>. It can install packages, write files, run scripts — but only within the mounted workspace.</p>

<p>This isn’t perfect security. Credentials stored in the workspace are accessible to the container. A malicious model could exfiltrate them. There’s no built-in secrets manager or credential vault — just files in the workspace. But it’s a reasonable trade-off: <strong>contain the blast radius without pretending you can prevent all harm.</strong></p>

<picture><source sizes="(max-width: 600px) 100vw, (max-width: 900px) 100vw, 800px" srcset="/assets/images/generated/skills-docker-containment-400-d6de9b9c2.webp 400w, /assets/images/generated/skills-docker-containment-800-d6de9b9c2.webp 800w, /assets/images/generated/skills-docker-containment-1200-d6de9b9c2.webp 1200w" type="image/webp" /><source sizes="(max-width: 600px) 100vw, (max-width: 900px) 100vw, 800px" srcset="/assets/images/generated/skills-docker-containment-400-f6d0ef2a2.png 400w, /assets/images/generated/skills-docker-containment-800-f6d0ef2a2.png 800w, /assets/images/generated/skills-docker-containment-1200-f6d0ef2a2.png 1200w" type="image/png" /><img loading="lazy" decoding="async" src="/assets/images/generated/skills-docker-containment-800-f6d0ef2a2.png" alt="A simple robot working productively at a small workbench inside a dashed boundary, with tools and files around it, while the space outside the boundary remains completely empty and calm" /></picture>

<h2 id="skill-hierarchy-in-practice">Skill Hierarchy in Practice</h2>

<p>The two-level skill system — global and channel-specific — enables useful patterns.</p>

<p>Say you have a global <code class="language-plaintext highlighter-rouge">github</code> skill that wraps <code class="language-plaintext highlighter-rouge">gh</code> CLI. Works fine for most channels. But your work channel needs to use a different GitHub org with different credentials. <strong>You create a channel-specific skill that overrides the global one:</strong></p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/workspace/skills/github/SKILL.md         # Global: personal GitHub
/workspace/C123ABC/skills/github/SKILL.md # Override: work GitHub
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">mom</code> checks channel-specific skills first. If it finds a match, it uses that. Otherwise, it falls back to global. The override is implicit — no configuration, just file placement.</p>

<h2 id="event-driven-wake-ups">Event-Driven Wake-ups</h2>

<p><code class="language-plaintext highlighter-rouge">mom</code> isn’t limited to responding to messages. It supports three types of scheduled events:</p>

<ul>
  <li><strong>Immediate</strong> — one-shot execution right now</li>
  <li><strong>One-shot</strong> — scheduled for a specific future time</li>
  <li><strong>Periodic</strong> — cron-based recurring execution (cron is the Unix scheduler format: <code class="language-plaintext highlighter-rouge">0 9 * * *</code> means “9am every day”)</li>
</ul>

<p>A DevOps channel might schedule <code class="language-plaintext highlighter-rouge">mom</code> to check deployment status every morning at 9am. <code class="language-plaintext highlighter-rouge">mom</code> wakes up, runs the health check, writes findings to <code class="language-plaintext highlighter-rouge">MEMORY.md</code>, and goes back to sleep. When you check in later, the summary is already there.</p>

<p>Webhooks work similarly. External systems can POST to <code class="language-plaintext highlighter-rouge">mom</code>’s endpoint, triggering an agent run with the webhook payload as context. CI/CD pipelines, monitoring alerts, calendar events — anything that can send HTTP can wake <code class="language-plaintext highlighter-rouge">mom</code>.</p>

<p>The agent runs, handles the event, and persists state to the workspace. <strong>No human in the loop required.</strong></p>

<h2 id="what-this-enables">What This Enables</h2>

<p><code class="language-plaintext highlighter-rouge">mom</code>’s self-management enables workflows that would be tedious to configure manually:</p>

<p><strong>Credential accumulation.</strong> The first time you mention GitHub, <code class="language-plaintext highlighter-rouge">mom</code> asks for a token, stores it in the workspace, and configures <code class="language-plaintext highlighter-rouge">gh</code> CLI. Next time, it just works. Over time, <code class="language-plaintext highlighter-rouge">mom</code> accumulates the credentials and tools for your specific workflow. (Yes, this means credentials live as files in the workspace — the security model is “contain the blast radius,” not “prevent all access.”)</p>

<p><strong>Context-specific skills.</strong> A channel for DevOps work might have skills for Kubernetes and AWS. A channel for writing might have skills for grammar checking and research. Each channel’s skill directory reflects its purpose.</p>

<p><strong>Autonomous maintenance.</strong> Scheduled wake-ups mean <code class="language-plaintext highlighter-rouge">mom</code> can do background work — checking logs, updating dashboards, sending reminders — without waiting for you to ask.</p>

<h2 id="what-power-users-build">What Power Users Build</h2>

<p><a href="https://lucumr.pocoo.org/2026/1/31/pi/">Armin Ronacher</a> runs Pi with a collection of custom extensions that demonstrate the pattern at scale. His setup includes:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">/answer</code> — reformats agent questions into dialog boxes for cleaner interaction</li>
  <li><code class="language-plaintext highlighter-rouge">/todos</code> — manages to-do lists as markdown files in the workspace</li>
  <li><code class="language-plaintext highlighter-rouge">/review</code> — code review interface with branching for iterative feedback</li>
  <li><code class="language-plaintext highlighter-rouge">/control</code> — multi-agent experimentation, running parallel approaches</li>
  <li><code class="language-plaintext highlighter-rouge">/files</code> — lists changed/referenced files with quick-look support</li>
</ul>

<p>He also replaced MCP servers entirely with CLI-based skills. Browser automation via Chrome DevTools Protocol (CDP is how tools like Puppeteer talk to Chrome — Vercel’s <a href="https://github.com/vercel-labs/agent-browser">Agent Browser</a> takes the same approach, reducing token usage by 93% compared to Playwright MCP). Changelog management. Commit tooling. All implemented as skills the agent reads on demand rather than tools loaded at startup.</p>

<p>The pattern works because skills are just files. Write a <code class="language-plaintext highlighter-rouge">SKILL.md</code>, maybe a script, and you’ve extended the agent. No plugin API to learn, no registration system, no restart required.</p>

<h2 id="the-taste-problem">The Taste Problem</h2>

<p>I wrote about <a href="/posts/2026/01/20/the-closing-gates-of-open-source/">the taste problem</a> — how vibe-coded contributions often lack the judgment that makes software coherent. <code class="language-plaintext highlighter-rouge">mom</code>’s self-extension could easily become a mess of poorly-designed skills.</p>

<p>The mitigation is structure. Skills have a defined format. The <code class="language-plaintext highlighter-rouge">SKILL.md</code> requires you to think about when the skill triggers (frontmatter) and how it executes (documentation and scripts). <code class="language-plaintext highlighter-rouge">mom</code> can extend itself, but the extension points are constrained.</p>

<p>This doesn’t guarantee good skills. But it raises the floor. <strong>A bad skill is at least a bad skill with proper metadata and documentation, not a random script <code class="language-plaintext highlighter-rouge">mom</code> decided to run.</strong></p>

<h2 id="the-philosophy">The Philosophy</h2>

<p>Traditional software is rigid. You ship features, users consume them, changes require releases. <code class="language-plaintext highlighter-rouge">mom</code> treats code as clay — reshaping at runtime based on what’s needed.</p>

<p>This isn’t new. Emacs has been self-extending for decades. But <code class="language-plaintext highlighter-rouge">mom</code> applies the philosophy to AI agents: the agent itself decides what capabilities to add, implements them, and uses them.</p>

<p>The question is whether you trust the agent’s judgment. <code class="language-plaintext highlighter-rouge">mom</code> requires that trust. In return, it handles the tedium of environment setup and lets you focus on what you actually want to do.</p>

<h2 id="what-makes-this-work">What Makes This Work</h2>

<p>Self-managing agents aren’t magic. They’re just agents with three things: good defaults, a workspace they’re allowed to modify, and constraints that channel the chaos.</p>

<p>The skill format is the key constraint. Without structure, <code class="language-plaintext highlighter-rouge">mom</code> would create a mess of one-off scripts with no documentation and inconsistent interfaces. The <code class="language-plaintext highlighter-rouge">SKILL.md</code> requirement forces organization. The frontmatter forces you to think about discovery. The documentation forces you to think about usage.</p>

<p><strong>The pattern is malleable code within rigid boundaries.</strong> The agent can write whatever skills it needs, but those skills must follow a format. The agent can install whatever tools it wants, but those changes are logged and reproducible. The agent can accumulate credentials, but they live in a known location with a known security model.</p>

<picture><source sizes="(max-width: 600px) 100vw, (max-width: 900px) 100vw, 800px" srcset="/assets/images/generated/skills-malleable-boundaries-400-2e9b620b1.webp 400w, /assets/images/generated/skills-malleable-boundaries-800-2e9b620b1.webp 800w, /assets/images/generated/skills-malleable-boundaries-1200-2e9b620b1.webp 1200w" type="image/webp" /><source sizes="(max-width: 600px) 100vw, (max-width: 900px) 100vw, 800px" srcset="/assets/images/generated/skills-malleable-boundaries-400-0f78185b6.png 400w, /assets/images/generated/skills-malleable-boundaries-800-0f78185b6.png 800w, /assets/images/generated/skills-malleable-boundaries-1200-0f78185b6.png 1200w" type="image/png" /><img loading="lazy" decoding="async" src="/assets/images/generated/skills-malleable-boundaries-800-0f78185b6.png" alt="A precise golden rectangular frame containing organic fluid shapes in dusty blue, sage green and soft coral that gently press against the boundary without breaking it" /></picture>

<p>This is the opposite of the “AI will figure it out” approach. It’s more like: <strong>AI will figure it out, within guardrails that keep the figuring-out coherent.</strong></p>]]></content><author><name>Doug Campos</name></author><category term="ai" /><category term="agents" /><category term="skills" /><category term="malleable-code" /><category term="pi" /><summary type="html"><![CDATA[Last week I wrote about Pi’s four-tool constraint and Mario Zechner’s philosophy of radical minimalism. But here’s the obvious question: how do you do anything useful with just read, write, edit, and bash? The answer is mom — short for “Master Of Mischief.” It’s a Slack bot Mario built on Pi that does something I haven’t seen elsewhere: it manages itself.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://random.qmx.me/assets/images/skills-hero.png" /><media:content medium="image" url="https://random.qmx.me/assets/images/skills-hero.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">How Computers Find Words</title><link href="https://random.qmx.me/posts/2026/02/09/how-computers-find-words/" rel="alternate" type="text/html" title="How Computers Find Words" /><published>2026-02-09T00:00:00+00:00</published><updated>2026-02-09T00:00:00+00:00</updated><id>https://random.qmx.me/posts/2026/02/09/how-computers-find-words</id><content type="html" xml:base="https://random.qmx.me/posts/2026/02/09/how-computers-find-words/"><![CDATA[<p>Every search engine you’ve ever used runs on an algorithm from 1994. Google, DuckDuckGo, the search bar in your email client - underneath all the machine learning, there’s a formula called BM25 doing the heavy lifting.</p>

<p><strong>BM25 is where search starts.</strong> Understanding it explains why some queries work beautifully and others return garbage.</p>

<!--more-->

<h2 id="the-grep-problem">The Grep Problem</h2>

<p>Let’s start with the simplest possible search: grep. You have files, you want to find which ones contain a word. Grep scans through them and returns matches.</p>

<p>This works when you remember exactly what you’re looking for. If I know I wrote “ECONNREFUSED” in some error handling code, grep finds it instantly.</p>

<p>But most searches aren’t like that. I vaguely remember a conversation about authentication. Did I call it “auth”? “Authentication”? “Login flow”? “Identity verification”? Grep needs me to guess right. If I guess wrong, I get nothing.</p>

<p>Even when grep finds matches, they’re unsorted. A file that mentions “authentication” once in passing ranks the same as a file entirely about authentication. <strong>That’s not helpful when you have hundreds of matches.</strong></p>

<p>BM25 solves the ranking problem. It scores results by relevance using surprisingly simple intuitions about what makes a document match a query well.</p>

<h2 id="rare-words-matter-more">Rare Words Matter More</h2>

<p>Imagine searching for “authentication error handling” across your notes.</p>

<p>The word “the” probably appears in every document. Finding “the” tells you nothing - it doesn’t help distinguish relevant documents from irrelevant ones.</p>

<p>The word “authentication” appears in maybe 5% of your documents. Finding it is meaningful - this document is probably about authentication.</p>

<p>The word “ECONNREFUSED” appears in maybe 0.1% of your documents. Finding it is very meaningful - this document almost certainly discusses that specific error.</p>

<p>BM25 formalizes this intuition. <strong>It weights rare terms higher and common terms lower.</strong> The technical name is “inverse document frequency” - the less frequently a term appears across all documents, the more it matters when it does appear.</p>

<p>This is why search engines handle stopwords (the, a, an, is, are) gracefully. They’re not removed or ignored - they just contribute almost nothing to the ranking because they appear everywhere.</p>

<h2 id="repetition-has-limits">Repetition Has Limits</h2>

<p>If a document mentions “authentication” once, that’s a signal. If it mentions “authentication” fifty times, that’s a stronger signal. But is it fifty times stronger?</p>

<p>You might think so, but no. A document that says “authentication” fifty times isn’t necessarily more relevant than one that says it ten times. At some point, you’re just being repetitive.</p>

<p>BM25 handles this with “term frequency saturation.” The first few occurrences of a word boost relevance significantly. Additional occurrences help less and less. <strong>Eventually, more repetition barely moves the needle.</strong></p>

<p>This prevents keyword stuffing from gaming the rankings. A document that artificially repeats search terms won’t dominate just because it has high raw counts.</p>

<h2 id="document-length-matters">Document Length Matters</h2>

<p>Consider two documents:</p>

<ul>
  <li>Document A: 200 words, mentions “authentication” 5 times</li>
  <li>Document B: 10,000 words, mentions “authentication” 5 times</li>
</ul>

<p>Which is more relevant to a search for “authentication”?</p>

<p>Probably Document A. It’s short and 2.5% of it is about authentication. Document B is long and authentication is just 0.05% of the content - probably a passing mention in a larger piece.</p>

<p>BM25 normalizes for document length. <strong>A match in a short document counts for more than the same match in a long document.</strong> This prevents lengthy documents from dominating results just because they contain more words.</p>

<h2 id="the-formula-you-dont-need-to-memorize">The Formula You Don’t Need to Memorize</h2>

<p>BM25 combines these three intuitions into a single relevance score:</p>

<ol>
  <li><strong>Rare terms get more weight</strong> (inverse document frequency)</li>
  <li><strong>Repetition helps, but with diminishing returns</strong> (saturating term frequency)</li>
  <li><strong>Matches in shorter documents count more</strong> (length normalization)</li>
</ol>

<p>The actual formula has tunable constants, but the intuitions are what matter. BM25 asks: “How much does finding these specific words in this specific document tell me about relevance?”</p>

<p>A document mentioning rare search terms multiple times without being padded with filler content scores well. A document barely touching on common words in a sea of unrelated text scores poorly.</p>

<picture><source sizes="(max-width: 600px) 100vw, (max-width: 900px) 100vw, 800px" srcset="/assets/images/generated/bm25-intuitions-400-403a9df18.webp 400w, /assets/images/generated/bm25-intuitions-800-403a9df18.webp 800w, /assets/images/generated/bm25-intuitions-1200-403a9df18.webp 1200w" type="image/webp" /><source sizes="(max-width: 600px) 100vw, (max-width: 900px) 100vw, 800px" srcset="/assets/images/generated/bm25-intuitions-400-90f9cebfa.png 400w, /assets/images/generated/bm25-intuitions-800-90f9cebfa.png 800w, /assets/images/generated/bm25-intuitions-1200-90f9cebfa.png 1200w" type="image/png" /><img loading="lazy" decoding="async" src="/assets/images/generated/bm25-intuitions-800-90f9cebfa.png" alt="Three intuitions of BM25: rare word discovery, diminishing returns of repetition, and document length normalization" /></picture>

<h2 id="why-this-still-matters">Why This Still Matters</h2>

<p><a href="https://en.wikipedia.org/wiki/Okapi_BM25">BM25</a> was published in 1994. In tech years, that’s ancient. Why is it still foundational?</p>

<p><strong>Because it’s fast and it works.</strong></p>

<p>BM25 doesn’t require machine learning. It doesn’t need GPUs or training data. You can implement it with basic data structures and run it on any hardware. SQLite’s FTS5 (Full-Text Search 5) extension implements BM25, which means any application with SQLite gets competent search almost for free.</p>

<p>For precise searches - error messages, function names, specific technical terms - BM25 is hard to beat. It does exactly what you want: finding documents that contain the words you typed, ranked by how relevant those matches seem.</p>

<h2 id="where-bm25-falls-short">Where BM25 Falls Short</h2>

<p>BM25 has a fundamental limitation: <strong>it only understands words, not meaning.</strong></p>

<picture><source sizes="(max-width: 600px) 100vw, (max-width: 900px) 100vw, 800px" srcset="/assets/images/generated/bm25-words-vs-meaning-400-c26f250f7.webp 400w, /assets/images/generated/bm25-words-vs-meaning-800-c26f250f7.webp 800w, /assets/images/generated/bm25-words-vs-meaning-1200-c26f250f7.webp 1200w" type="image/webp" /><source sizes="(max-width: 600px) 100vw, (max-width: 900px) 100vw, 800px" srcset="/assets/images/generated/bm25-words-vs-meaning-400-09daedc25.png 400w, /assets/images/generated/bm25-words-vs-meaning-800-09daedc25.png 800w, /assets/images/generated/bm25-words-vs-meaning-1200-09daedc25.png 1200w" type="image/png" /><img loading="lazy" decoding="async" src="/assets/images/generated/bm25-words-vs-meaning-800-09daedc25.png" alt="Rigid word matching on the left versus a fluid network of meaning on the right" /></picture>

<p>If you search for “authentication,” BM25 will not find documents that only use the word “login.” To BM25, those are completely different strings. It has no concept of synonyms, related concepts, or semantic similarity.</p>

<p>BM25 fails when:</p>

<ul>
  <li>You don’t remember the exact terminology (“auth” vs “authentication”)</li>
  <li>The document uses different words for the same concept (“credentials” vs “password”)</li>
  <li>You’re searching for a concept rather than specific words (“how users prove their identity”)</li>
</ul>

<p>BM25 also can’t handle typos. Search for “authentcation” (missing an ‘i’) and you’ll get nothing, even if you have dozens of relevant documents.</p>

<p>These limitations are fundamental to how BM25 works. It compares character sequences, not meaning. For thirty years, this was an acceptable tradeoff - the speed and simplicity were worth the occasional missed result.</p>

<p>Now there are other options. Vector search understands meaning rather than just words. The old algorithm isn’t obsolete; it’s just no longer alone.</p>

<h2 id="the-takeaway">The Takeaway</h2>

<p>BM25 embodies a few simple truths about text search:</p>

<ul>
  <li>Rare words are more informative than common words</li>
  <li>Repetition matters, but not linearly</li>
  <li>Shorter documents with matches are probably more focused</li>
</ul>

<p>These intuitions are timeless. They applied in 1994 and they apply today. Understanding BM25 helps you understand why search behaves the way it does - why some queries return great results and others miss obvious matches.</p>

<p>When a query fails, it’s usually because you’re asking for meaning and BM25 only knows words. <strong>That’s not a flaw in the algorithm - it’s a limitation of the approach.</strong> One that newer techniques address, but never fully replace. Sometimes you really do want exact word matching, and a thirty-year-old formula will be there when you do.</p>]]></content><author><name>Doug Campos</name></author><category term="ai" /><category term="search" /><category term="information-retrieval" /><category term="bm25" /><category term="qmd" /><summary type="html"><![CDATA[Every search engine you’ve ever used runs on an algorithm from 1994. Google, DuckDuckGo, the search bar in your email client - underneath all the machine learning, there’s a formula called BM25 doing the heavy lifting. BM25 is where search starts. Understanding it explains why some queries work beautifully and others return garbage.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://random.qmx.me/assets/images/bm25-hero.png" /><media:content medium="image" url="https://random.qmx.me/assets/images/bm25-hero.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Four Tools and a Lobster</title><link href="https://random.qmx.me/posts/2026/02/04/four-tools-and-a-lobster/" rel="alternate" type="text/html" title="Four Tools and a Lobster" /><published>2026-02-04T00:00:00+00:00</published><updated>2026-02-04T00:00:00+00:00</updated><id>https://random.qmx.me/posts/2026/02/04/four-tools-and-a-lobster</id><content type="html" xml:base="https://random.qmx.me/posts/2026/02/04/four-tools-and-a-lobster/"><![CDATA[<p>You might have heard of <a href="https://github.com/openclaw/openclaw">OpenClaw</a> — the open-source AI assistant running on everything from WhatsApp to a Raspberry Pi, mass adoption, mass controversy. What powers it is a tiny agent called <strong><a href="https://github.com/badlogic/pi-mono/">Pi</a></strong>, built by <a href="https://mariozechner.at/posts/2025-11-30-pi-coding-agent/">Mario Zechner</a> with a philosophy I haven’t seen elsewhere: <em>“if I don’t need it, it won’t be built.”</em></p>

<p>The result? Four tools. A system prompt under 1,000 tokens. No MCP. No plugin ecosystem. It’s <a href="https://martinfowler.com/bliki/Yagni.html">YAGNI</a> applied to agent architecture.</p>

<!--more-->

<h2 id="the-spaceship-problem">The Spaceship Problem</h2>

<p>Mario built Pi because existing coding agents became, in his words, “a spaceship with 80% of functionality I have no use for.” Claude Code, Cursor, Windsurf — they all ship with dozens of specialized tools, elaborate planning modes, and pre-built integrations. The assumption is that more capabilities mean more power.</p>

<p>Pi bets the opposite. <strong>Constraints are liberating.</strong> They’re architecture.</p>

<h2 id="four-tools">Four Tools</h2>

<p>The entire foundation is four tools:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">read</code> — examine files with configurable line limits</li>
  <li><code class="language-plaintext highlighter-rouge">write</code> — create files or complete overwrites</li>
  <li><code class="language-plaintext highlighter-rouge">edit</code> — surgical text replacement via old/new string pairs</li>
  <li><code class="language-plaintext highlighter-rouge">bash</code> — command execution with no guardrails</li>
</ul>

<p>That’s it. No glob, no grep, no specialized search. If you want to find something, you <code class="language-plaintext highlighter-rouge">bash</code> a <code class="language-plaintext highlighter-rouge">find</code> or <code class="language-plaintext highlighter-rouge">rg</code> command. If you want to understand a codebase, you <code class="language-plaintext highlighter-rouge">read</code> files directly.</p>

<picture><source sizes="(max-width: 600px) 100vw, (max-width: 900px) 100vw, 800px" srcset="/assets/images/generated/four-tools-foundation-400-298ffc976.webp 400w, /assets/images/generated/four-tools-foundation-800-298ffc976.webp 800w, /assets/images/generated/four-tools-foundation-1200-298ffc976.webp 1200w" type="image/webp" /><source sizes="(max-width: 600px) 100vw, (max-width: 900px) 100vw, 800px" srcset="/assets/images/generated/four-tools-foundation-400-d24549d31.png 400w, /assets/images/generated/four-tools-foundation-800-d24549d31.png 800w, /assets/images/generated/four-tools-foundation-1200-d24549d31.png 1200w" type="image/png" /><img loading="lazy" decoding="async" src="/assets/images/generated/four-tools-foundation-800-d24549d31.png" alt="Four curated tools: a magnifying glass for reading, a pen for writing, a chisel for editing, and a terminal for commands" /></picture>

<p>The tool definitions are minimal:</p>

<div class="language-typescript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span>
  <span class="nl">name</span><span class="p">:</span> <span class="dl">"</span><span class="s2">read</span><span class="dl">"</span><span class="p">,</span>
  <span class="nx">description</span><span class="p">:</span> <span class="dl">"</span><span class="s2">Read file contents</span><span class="dl">"</span><span class="p">,</span>
  <span class="nx">parameters</span><span class="p">:</span> <span class="p">{</span>
    <span class="nl">path</span><span class="p">:</span> <span class="p">{</span> <span class="na">type</span><span class="p">:</span> <span class="dl">"</span><span class="s2">string</span><span class="dl">"</span><span class="p">,</span> <span class="na">description</span><span class="p">:</span> <span class="dl">"</span><span class="s2">File path</span><span class="dl">"</span> <span class="p">},</span>
    <span class="nx">limit</span><span class="p">:</span> <span class="p">{</span> <span class="nl">type</span><span class="p">:</span> <span class="dl">"</span><span class="s2">number</span><span class="dl">"</span><span class="p">,</span> <span class="nx">description</span><span class="p">:</span> <span class="dl">"</span><span class="s2">Max lines to read</span><span class="dl">"</span> <span class="p">}</span>
  <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>No elaborate schemas. No nested options. The model figures out how to use <code class="language-plaintext highlighter-rouge">read</code> because it’s read millions of examples of file reading in training. The tool just exposes the capability.</p>

<p>The reasoning: frontier models have been RL-trained extensively on coding tasks. They inherently understand what a coding agent is. You don’t need thousands of tokens of system prompt explaining how to be helpful. The model already knows.</p>

<h2 id="the-system-prompt">The System Prompt</h2>

<p>Most agent harnesses ship with massive system prompts. Claude Code’s runs thousands of tokens — tool descriptions, safety guidelines, behavioral rules, edge case handling, formatting instructions.</p>

<p><strong>Pi’s system prompt is under 1,000 tokens.</strong> Here’s roughly what’s in it:</p>

<ul>
  <li><strong>Tool descriptions</strong> (~400 tokens for all four tools)</li>
  <li><strong>Working directory and environment info</strong> (~100 tokens)</li>
  <li><strong>Project context file reference</strong> (AGENTS.md) (~50 tokens)</li>
  <li><strong>Basic behavioral guidelines</strong> (~200 tokens)</li>
</ul>

<p>No elaborate persona. No safety theater. No “you are a helpful assistant who…” preamble. <strong>The model knows it’s a coding agent because it’s being asked to code.</strong></p>

<p>This isn’t minimalism for its own sake. <strong>Every token in your system prompt is a token that isn’t available for actual work.</strong> If you burn 5,000 tokens on instructions, you’ve shrunk your effective context window by 5,000 tokens. Do that across a long session with compaction, and the overhead compounds.</p>

<p><a href="https://lucumr.pocoo.org/2026/1/31/pi/">Armin Ronacher</a>, who switched to Pi after trying most alternatives, puts it directly: the system prompt is “the shortest of any agent.” He runs Pi almost exclusively now, replacing Claude Code, Cursor, and custom setups. The minimal prompt isn’t a limitation — it’s why the agent stays focused.</p>

<h2 id="yolo-by-default">YOLO by Default</h2>

<p>Here’s where Pi gets controversial: no permission checks.</p>

<p>Most coding agents implement elaborate approval flows. “The agent wants to run <code class="language-plaintext highlighter-rouge">rm -rf</code>. Allow?” The theory is safety. The practice is theater.</p>

<p>Once you give an agent code execution, artificial restrictions are futile. A sufficiently capable model will find ways around guardrails — or you’ll click “approve” a hundred times and stop reading what you’re approving. Either way, the safety is illusory.</p>

<p>Pi acknowledges this. Unrestricted filesystem access. Any command without permission checks. The “sandbox” is your judgment about what you ask it to do, not a popup you’ll learn to ignore.</p>

<h2 id="against-mcp">Against MCP</h2>

<p>I’ve written about <a href="/posts/2026/01/04/on-beads-bloat-and-breaking-points/">MCP and context bloat</a> before. Pi takes a strong stance: no MCP support at all.</p>

<p>The argument is architectural. MCP servers load tools into your context at startup. Take Playwright MCP for browser automation:</p>

<ul>
  <li>21 tools loaded at startup</li>
  <li>13,700 tokens in system prompt</li>
  <li>Present whether you need browser automation or not</li>
</ul>

<p>That’s 13.7k tokens of your context window gone before you’ve typed anything. Add a few more MCP servers and you’re burning 30-40k tokens on tool descriptions alone.</p>

<p>Mario’s alternative is embarrassingly simple: CLI tools with README documentation.</p>

<ul>
  <li>Shell script: <code class="language-plaintext highlighter-rouge">playwright-browse.sh</code></li>
  <li>README: ~225 tokens explaining usage</li>
  <li>Loaded only when the agent reads the README</li>
</ul>

<p>When the agent needs browser automation, it reads the README, understands the interface, and calls the CLI via <code class="language-plaintext highlighter-rouge">bash</code>. When it doesn’t need browser automation, those 225 tokens don’t exist in context.</p>

<p>The math is stark. A session that might use browser automation once:</p>

<ul>
  <li><strong>MCP approach</strong>: 13,700 tokens for the entire session</li>
  <li><strong>CLI approach</strong>: 225 tokens when actually needed, 0 otherwise</li>
</ul>

<p>The difference isn’t just token count. It’s <strong>progressive disclosure</strong>. MCP front-loads everything. CLI tools load on demand. For agents that do many different things across a session, the savings compound.</p>

<picture><source sizes="(max-width: 600px) 100vw, (max-width: 900px) 100vw, 800px" srcset="/assets/images/generated/four-tools-progressive-disclosure-400-5f41bd4ad.webp 400w, /assets/images/generated/four-tools-progressive-disclosure-800-5f41bd4ad.webp 800w, /assets/images/generated/four-tools-progressive-disclosure-1200-5f41bd4ad.webp 1200w" type="image/webp" /><source sizes="(max-width: 600px) 100vw, (max-width: 900px) 100vw, 800px" srcset="/assets/images/generated/four-tools-progressive-disclosure-400-3e7b27ae3.png 400w, /assets/images/generated/four-tools-progressive-disclosure-800-3e7b27ae3.png 800w, /assets/images/generated/four-tools-progressive-disclosure-1200-3e7b27ae3.png 1200w" type="image/png" /><img loading="lazy" decoding="async" src="/assets/images/generated/four-tools-progressive-disclosure-800-3e7b27ae3.png" alt="Heavy overflowing suitcase of unused tools contrasted with a compact pouch with just one glowing tool pulled out" /></picture>

<h2 id="no-built-in-planning">No Built-in Planning</h2>

<p>Pi doesn’t have a planning mode. No to-do lists. No automatic task decomposition.</p>

<p>The reasoning: planning modes create model state-management overhead. The agent has to track what it planned, what it completed, what changed. That’s cognitive load that could go toward the actual work.</p>

<p><strong>If you need a plan, write it to a file.</strong> The file persists across sessions. You can edit it. The agent can read it. But the state lives in the filesystem, not in some hidden UI abstraction.</p>

<p>This applies to sub-agents too. Pi doesn’t have built-in orchestration. If you want to spawn a sub-agent, you <code class="language-plaintext highlighter-rouge">bash</code> it — spawn another Pi instance with a specific task. The parent sees the subprocess output. The delegation is visible, not hidden.</p>

<h2 id="benchmark-results">Benchmark Results</h2>

<p>You’d expect a minimal agent to underperform feature-rich alternatives. The benchmarks say otherwise.</p>

<p><a href="https://www.tbench.ai/leaderboard/terminal-bench/2.0">Terminal-Bench 2.0</a> tested Pi with Claude Opus 4.5 against specialized harnesses — Codex, Cursor, Windsurf. Pi held its own despite having a fraction of the tooling.</p>

<p>More interesting: the benchmark team’s own <strong>Terminus</strong> agent provides only tmux access — even more minimal than Pi. It also competed effectively. The conclusion: <strong>sophisticated tooling might matter less than we assumed.</strong></p>

<h2 id="the-philosophy">The Philosophy</h2>

<p>Mario summarizes it as five principles:</p>

<ol>
  <li><strong>Models self-supervise coding tasks</strong> — frontier models need minimal prompting about their role</li>
  <li><strong>Sub-agents risk coherence</strong> — parallel feature implementation via spawned agents produces fragmented codebases</li>
  <li><strong>File-based state beats UI state</strong> — markdown artifacts outlast sessions and enable cross-session collaboration</li>
  <li><strong>Observability over orchestration</strong> — hidden delegations destroy debuggability</li>
  <li><strong>Token efficiency matters</strong> — context discipline beats feature maximalism</li>
</ol>

<p>This isn’t advice for all agents. It’s a specific bet: <strong>the best coding agent for experienced developers is one that gets out of the way.</strong></p>

<h2 id="what-pi-powers">What Pi Powers</h2>

<p>Pi isn’t just a solo project. It’s what powers OpenClaw — the multi-channel assistant that’s been generating both excitement and concern lately. OpenClaw adds the infrastructure layer: WhatsApp, Telegram, Slack, Discord, Signal, iMessage, Matrix. But the brain is Pi.</p>

<p>It’s also what powers <strong>mom</strong> (Master Of Mischief), Mario’s self-managing Slack bot that installs its own tools and writes its own skills. The minimal foundation turns out to be surprisingly extensible.</p>

<h2 id="feeling-the-difference">Feeling the Difference</h2>

<p>I spent half an hour with Pi today and I’m still processing what happened. Understanding something intellectually is different from <em>feeling</em> it.</p>

<p><strong>The agent felt snappy running a local model.</strong> Not “acceptable for local.” Actually snappy.</p>

<p>GLM-4.7-Flash-Q5-200K running locally on my Framework Desktop (Strix Halo, Ryzen AI MAX+ 395). No cloud API. No internet required. Just a quantized model doing actual work.</p>

<p>I pointed it at my Nix dotfiles repo — flakes, home-manager configs, NixOS hosts, custom packages. Within seconds, it read <code class="language-plaintext highlighter-rouge">flake.nix</code>, ran <code class="language-plaintext highlighter-rouge">ls -la</code>, read the README, listed the hosts directory. Then delivered a comprehensive breakdown: multi-host management, configuration layers, custom packages, secrets management. It understood the architecture.</p>

<p><strong>No verbose preamble.</strong> No elaborate planning. Just read, understand, explain.</p>

<p>Then I pushed it: “I’d love to have a plan mode for yourself, can you help me add that?”</p>

<p>The agent explored the Nix store where Pi lives, found the <code class="language-plaintext highlighter-rouge">packages/coding-agent/docs/</code> directory, read <code class="language-plaintext highlighter-rouge">extensions.md</code> to understand the extension API. Then it <strong>one-shot created a complete plan mode extension</strong>:</p>

<div class="language-typescript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">import</span> <span class="kd">type</span> <span class="p">{</span> <span class="nx">ExtensionAPI</span> <span class="p">}</span> <span class="k">from</span> <span class="dl">"</span><span class="s2">@mariozechner/pi-coding-agent</span><span class="dl">"</span><span class="p">;</span>

<span class="k">export</span> <span class="k">default</span> <span class="nf">function </span><span class="p">(</span><span class="nx">pi</span><span class="p">:</span> <span class="nx">ExtensionAPI</span><span class="p">)</span> <span class="p">{</span>
  <span class="kd">let</span> <span class="nx">inPlanMode</span> <span class="o">=</span> <span class="kc">false</span><span class="p">;</span>
  <span class="kd">let</span> <span class="nx">planQueue</span><span class="p">:</span> <span class="nb">Array</span><span class="o">&lt;</span><span class="p">{</span> <span class="na">toolName</span><span class="p">:</span> <span class="kr">string</span><span class="p">;</span> <span class="nl">input</span><span class="p">:</span> <span class="kr">any</span> <span class="p">}</span><span class="o">&gt;</span> <span class="o">=</span> <span class="p">[];</span>

  <span class="nx">pi</span><span class="p">.</span><span class="nf">registerCommand</span><span class="p">(</span><span class="dl">"</span><span class="s2">plan</span><span class="dl">"</span><span class="p">,</span> <span class="p">{</span>
    <span class="na">description</span><span class="p">:</span> <span class="dl">"</span><span class="s2">Toggle plan mode - preview actions before execution</span><span class="dl">"</span><span class="p">,</span>
    <span class="na">handler</span><span class="p">:</span> <span class="k">async </span><span class="p">(</span><span class="nx">args</span><span class="p">,</span> <span class="nx">ctx</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="p">{</span>
      <span class="c1">// ... complete implementation</span>
    <span class="p">},</span>
  <span class="p">});</span>

  <span class="nx">pi</span><span class="p">.</span><span class="nf">on</span><span class="p">(</span><span class="dl">"</span><span class="s2">tool_call</span><span class="dl">"</span><span class="p">,</span> <span class="k">async </span><span class="p">(</span><span class="nx">event</span><span class="p">,</span> <span class="nx">ctx</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="p">{</span>
    <span class="k">if </span><span class="p">(</span><span class="nx">inPlanMode</span><span class="p">)</span> <span class="p">{</span>
      <span class="nx">ctx</span><span class="p">.</span><span class="nx">ui</span><span class="p">.</span><span class="nf">notify</span><span class="p">(</span><span class="s2">`[PLAN] Would execute: </span><span class="p">${</span><span class="nx">event</span><span class="p">.</span><span class="nx">toolName</span><span class="p">}</span><span class="s2">`</span><span class="p">,</span> <span class="dl">"</span><span class="s2">info</span><span class="dl">"</span><span class="p">);</span>
      <span class="k">return</span> <span class="p">{</span> <span class="na">block</span><span class="p">:</span> <span class="kc">true</span><span class="p">,</span> <span class="na">reason</span><span class="p">:</span> <span class="dl">"</span><span class="s2">Plan mode: blocked execution</span><span class="dl">"</span> <span class="p">};</span>
    <span class="p">}</span>
  <span class="p">});</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Command registration, event interception, state management, UI feedback, documentation, a demo script. It figured out from the docs that Pi uses <code class="language-plaintext highlighter-rouge">@mariozechner/pi-coding-agent</code> for types and <code class="language-plaintext highlighter-rouge">@sinclair/typebox</code> for schemas. It understood the event system, the context object, the UI API.</p>

<p><strong>A local quantized model one-shot created a working extension for an agent framework it had never seen before, by reading the source code and documentation.</strong></p>

<picture><source sizes="(max-width: 600px) 100vw, (max-width: 900px) 100vw, 800px" srcset="/assets/images/generated/four-tools-local-magic-400-9998b95da.webp 400w, /assets/images/generated/four-tools-local-magic-800-9998b95da.webp 800w, /assets/images/generated/four-tools-local-magic-1200-9998b95da.webp 1200w" type="image/webp" /><source sizes="(max-width: 600px) 100vw, (max-width: 900px) 100vw, 800px" srcset="/assets/images/generated/four-tools-local-magic-400-9b535ad0c.png 400w, /assets/images/generated/four-tools-local-magic-800-9b535ad0c.png 800w, /assets/images/generated/four-tools-local-magic-1200-9b535ad0c.png 1200w" type="image/png" /><img loading="lazy" decoding="async" src="/assets/images/generated/four-tools-local-magic-800-9b535ad0c.png" alt="Small desktop computer radiating warm golden glow with freshly created components floating gently around it" /></picture>

<p>And here’s the kicker: I didn’t even restart the agent. <code class="language-plaintext highlighter-rouge">/reload</code>, and the new plan mode was live. This is <a href="https://malleable.systems/">malleable software</a> in practice — the system reshaping itself in response to a conversation. The gap between “I wish this existed” and “now it does” collapsed to minutes.</p>

<p>The minimal system prompt matters more than I understood. With a local model, you feel every token. There’s no massive datacenter hiding the latency. Less system prompt means faster first response, more room for context, more of the model’s attention on your actual task.</p>

<p><strong>The capability floor with a minimal agent is higher than expected.</strong> The agent’s performance is bottlenecked by the model, not the harness. A minimal harness lets the model show its actual capability. A bloated harness adds overhead that masks it.</p>

<h2 id="the-trade-offs">The Trade-offs</h2>

<p>Pi isn’t for everyone. The YOLO approach requires trust — in the model, in yourself, in your backups. The minimal tooling means you’re often writing bash one-liners that a specialized tool would handle automatically.</p>

<p>But the trade-off is clarity. When something goes wrong, you know exactly what happened. There’s no hidden orchestration layer, no mysterious context injection, no tools you forgot were loaded.</p>

<p>I’m sold. Not just intellectually — I was already there after reading about it. I’m sold experientially. This is what the future feels like: not more features, not more tokens, not more complexity. <strong>Less.</strong> Less prompt. Less scaffolding. Less overhead. More actual capability.</p>

<p>Four tools. A tiny prompt. A model that knows what it’s doing. That’s enough.</p>]]></content><author><name>Doug Campos</name></author><category term="ai" /><category term="agents" /><category term="architecture" /><category term="open-source" /><category term="pi" /><category term="local-models" /><summary type="html"><![CDATA[You might have heard of OpenClaw — the open-source AI assistant running on everything from WhatsApp to a Raspberry Pi, mass adoption, mass controversy. What powers it is a tiny agent called Pi, built by Mario Zechner with a philosophy I haven’t seen elsewhere: “if I don’t need it, it won’t be built.” The result? Four tools. A system prompt under 1,000 tokens. No MCP. No plugin ecosystem. It’s YAGNI applied to agent architecture.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://random.qmx.me/assets/images/four-tools-hero.png" /><media:content medium="image" url="https://random.qmx.me/assets/images/four-tools-hero.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">The Closing Gates of Open Source</title><link href="https://random.qmx.me/posts/2026/01/20/the-closing-gates-of-open-source/" rel="alternate" type="text/html" title="The Closing Gates of Open Source" /><published>2026-01-20T00:00:00+00:00</published><updated>2026-01-20T00:00:00+00:00</updated><id>https://random.qmx.me/posts/2026/01/20/the-closing-gates-of-open-source</id><content type="html" xml:base="https://random.qmx.me/posts/2026/01/20/the-closing-gates-of-open-source/"><![CDATA[<p>I tried to contribute a fix to <a href="https://github.com/Dicklesworthstone/beads_viewer/pull/18#issuecomment-3658808639">beads_viewer</a> recently. Found a real bug where a hardcoded path should have used a stored variable, wrote up a clean fix, and submitted the PR. The maintainer acknowledged the bug was legitimate, thanked me for finding it, and then hit me with something I’d never heard before: “we don’t accept outside code contributions for this project.”</p>

<p>I was shocked. This wasn’t a rejection because my code was bad or my approach was wrong. The door was simply closed to everyone. The maintainer offered to re-implement the fix themselves, which felt strange - they’d essentially take my idea and write it again from scratch. I asked them to at least document the policy in the README so other people don’t waste their time discovering this the hard way.</p>

<p>That experience stuck with me, and then I started seeing the pattern everywhere.</p>

<!--more-->

<h2 id="the-pattern">The Pattern</h2>

<p>Days later I saw tweets about major GitHub repositories stopping external contributions. The reason: unprecedented levels of AI-generated slop flooding their pull request queues.</p>

<p><a href="https://github.com/tldraw/tldraw/issues/7695#issue-3819192025">tldraw announced</a> they would automatically close pull requests from external contributors going forward:</p>

<blockquote>
  <p>“Like many other open-source projects on GitHub, we’ve recently seen a significant increase in contributions generated entirely by AI tools.”</p>
</blockquote>

<p>The submissions suffer from incomplete context and fundamental misunderstandings of how the codebase works. Worse, the people submitting them rarely stick around for the back-and-forth that any real contribution requires. An open pull request represents a commitment from maintainers to review it seriously, to engage with the contributor, to shepherd the change through to completion. When the signal-to-noise ratio collapses, that commitment becomes impossible to sustain.</p>

<p>I get why maintainers are doing this. Open source maintenance is exhausting even under the best circumstances. Every PR needs careful review, testing, communication, and often multiple rounds of revision. When most of what lands in your queue is low-quality AI slop from people who won’t be there for the follow-up, the cost-benefit math completely breaks down. But something valuable is being lost in the process.</p>

<h2 id="the-ladder-we-climbed">The Ladder We Climbed</h2>

<p>Open source contributions gave me a lot of exposure early in my career. It’s how I got my name out there, learned how real projects work, and connected with people who were better than me.</p>

<p>The process was straightforward but powerful. You find a project you actually use. You hit a bug that annoys you. You dig into the codebase, figure out what’s going wrong, and fix it. You submit a PR. The maintainer reviews it, maybe asks for changes, and you go back and forth until the code is ready. Through that process you learn how real codebases are structured, how to communicate with other developers asynchronously, how to write code that works with existing patterns rather than against them.</p>

<p>That door is closing now, and I worry about what happens to the next generation of developers.</p>

<p>When I mentored junior engineers, my advice was always the same: contribute to open source. Pick projects you genuinely care about and fix problems that annoy you. The learning that happens through that process is something no tutorial or bootcamp can replicate. You develop an intuition for how software systems work at scale, how to navigate unfamiliar code, how to communicate technical ideas clearly to people you’ve never met.</p>

<p>If that path is no longer available, we’re making things significantly harder for people trying to enter this field. The ladder we climbed is being pulled up behind us.</p>

<h2 id="the-vibe-coding-problem">The Vibe Coding Problem</h2>

<picture><source sizes="(max-width: 600px) 100vw, (max-width: 900px) 100vw, 800px" srcset="/assets/images/generated/closing-gates-slop-flood-400-a2f0158c8.webp 400w, /assets/images/generated/closing-gates-slop-flood-800-a2f0158c8.webp 800w, /assets/images/generated/closing-gates-slop-flood-1200-a2f0158c8.webp 1200w" type="image/webp" /><source sizes="(max-width: 600px) 100vw, (max-width: 900px) 100vw, 800px" srcset="/assets/images/generated/closing-gates-slop-flood-400-83dd00f35.png 400w, /assets/images/generated/closing-gates-slop-flood-800-83dd00f35.png 800w, /assets/images/generated/closing-gates-slop-flood-1200-83dd00f35.png 1200w" type="image/png" /><img loading="lazy" decoding="async" src="/assets/images/generated/closing-gates-slop-flood-800-83dd00f35.png" alt="A lone maintainer overwhelmed at their desk as robotic arms on conveyor belts launch an endless avalanche of identical pull requests" /></picture>

<p>Even so, part of me understands why this is happening.</p>

<p>The tools make it dangerously easy. You can point Claude at a repository, describe a bug vaguely, and get a pull request in minutes. You don’t need to understand the codebase architecture. You don’t need to diagnose the root cause of the problem. You just paste the output and click submit. The friction that used to filter for genuine engagement has been almost entirely removed.</p>

<p><a href="/posts/2026/01/04/on-beads-bloat-and-breaking-points/">I’ve written before</a> about vibe coding and what it means when code becomes disposable. Steve Yegge built Beads - 225,000 lines of Go - without ever looking at the code himself. It shipped, it delivered real value to real people, but the technical debt ran deep. That approach works when you own the whole thing and can iterate freely. It falls apart when you’re contributing to someone else’s project.</p>

<p>The same dynamic is now playing out across open source. <strong>People submit PRs they don’t understand, to fix problems they haven’t actually diagnosed, using code they couldn’t maintain if their lives depended on it.</strong> Sometimes the PR even works on the surface. But when it doesn’t - when it breaks something subtle, when it needs iteration, when the maintainer asks a clarifying question - there’s nobody home. The person who submitted it has already moved on to their next AI-assisted drive-by.</p>

<p><strong>You can’t check your brain at the door when contributing to open source.</strong> If you don’t understand what you’re submitting, you’re not helping - you’re just shifting the burden onto maintainers who now have to figure out whether your contribution is sound, often with zero assistance from you. That’s not collaboration. That’s dumping work on volunteers.</p>

<p>There’s an irony here. Claude Code itself is reportedly <a href="https://x.com/thdxr/status/2011787201500139553">100% vibe coded</a> - built entirely through AI-assisted development. And it works, it’s actually a good product. The approach seems to be: iterate until it works, worry less about the intermediate steps.</p>

<p>I’ve been wrestling with this tension myself. In my beads post I admitted I was <a href="/posts/2026/01/04/on-beads-bloat-and-breaking-points/">trying to let go</a> of caring so much about code quality - treating code as disposable rather than precious. But I keep coming back to the same conclusion: I’m not ready to completely stop caring about what’s being generated. <strong>The leverage comes from collaboration with the AI, not from blind delegation.</strong> When you submit something to an open source project, you’re asking maintainers to trust your judgment. If you’ve outsourced that judgment entirely to an AI you didn’t supervise, <strong>you’re wasting everyone’s time.</strong></p>

<h2 id="the-fun-part">The Fun Part</h2>

<p>I’m having more fun coding now than I have in years.</p>

<p>AI has unlocked experimentation in ways that weren’t practical before. Complex algorithms and data structures that would have taken weeks to implement correctly can now be explored in an afternoon. I recently wanted to understand CRDTs and Lamport clocks better, so I started building <a href="https://github.com/qmx/sterna">Sterna</a>, my own experiment at a beads replacement. The idea might be overkill, might even be stupid, but it was genuinely fun to build and the thing actually works.</p>

<p>I think that’s what <a href="https://antirez.com/news/158">antirez</a> is getting at. The creator of Redis recently wrote about his own experience with AI-assisted development. In a single week of prompting and inspecting code, he modified his linenoise library to support UTF-8, fixed transient Redis test failures, created a pure C library for BERT inference in 700 lines, and reproduced weeks of Redis Streams work in about 20 minutes.</p>

<blockquote>
  <p>“It is now clear that for most projects, writing the code yourself is no longer sensible, if not to have fun.”</p>
</blockquote>

<p>And on the passion for building:</p>

<blockquote>
  <p>“What was the fire inside you, when you coded till night to see your project working? It was building. And now you can build more and better, if you find your way to use AI effectively. The fun is still there, untouched.”</p>
</blockquote>

<p>I’m more bullish than antirez on this. The productivity gains are real and I use these tools constantly. <a href="/posts/2025/12/08/the-dog-ate-my-ai-generated-code/">I’ve written about</a> staying in the “smart zone” and managing context effectively. When you know what you’re doing, the gains are insane.</p>

<p>But I diverge from antirez on one point: I still care about understanding the output. He says “writing code is no longer needed for the most part.” I’d frame it differently - writing code is changing, not disappearing. The leverage comes from collaboration, not from turning your brain off and letting the AI drive.</p>

<h2 id="the-taste-problem">The Taste Problem</h2>

<picture><source sizes="(max-width: 600px) 100vw, (max-width: 900px) 100vw, 800px" srcset="/assets/images/generated/closing-gates-taste-400-3cfc7349b.webp 400w, /assets/images/generated/closing-gates-taste-800-3cfc7349b.webp 800w, /assets/images/generated/closing-gates-taste-1200-3cfc7349b.webp 1200w" type="image/webp" /><source sizes="(max-width: 600px) 100vw, (max-width: 900px) 100vw, 800px" srcset="/assets/images/generated/closing-gates-taste-400-76c9d7e33.png 400w, /assets/images/generated/closing-gates-taste-800-76c9d7e33.png 800w, /assets/images/generated/closing-gates-taste-1200-76c9d7e33.png 1200w" type="image/png" /><img loading="lazy" decoding="async" src="/assets/images/generated/closing-gates-taste-800-76c9d7e33.png" alt="Weathered hands carefully examining code like a jeweler inspecting a gem, contrasted with a cold industrial machine stamping out identical copies" /></picture>

<p>I’m not claiming my taste in code is better than anyone else’s. But I do have opinions - about how code should look, how systems should be structured, when a solution is elegant versus when it’s a hack waiting to bite you. Those opinions came from 20+ years of writing code the hard way, making mistakes, and learning what actually works in production.</p>

<p>That experience is what lets me evaluate AI output meaningfully. How do you develop that taste if you never write code yourself? If you never struggle through a problem manually? If you never have a PR rejected and have to figure out why?</p>

<p>Taste requires exposure and iteration. You develop heuristics through experience - writing bad code, having it rejected, learning what actually works in production versus what just looks plausible. If junior developers can’t get their PRs accepted anywhere, if they’re just prompting AI and submitting the output without understanding it, where’s the feedback loop that builds judgment?</p>

<h2 id="no-answers">No Answers</h2>

<p>I don’t have solutions. I’m concerned and working through it myself.</p>

<p>Open source contribution used to be a door into this industry. It was how you proved yourself when you had no professional experience. It was how you learned from people better than you by actually engaging with their code and their feedback. It was how communities formed around shared problems and collective ownership.</p>

<p>That model assumed contributors understood their contributions. It assumed maintainers could trust that someone would be there to iterate when issues came up. It assumed the cost of review was worth the benefit of community input. AI broke those assumptions - not because AI is bad, but because it removed the friction that used to filter for genuine engagement. When contributing is as easy as “prompt and submit,” you get a flood of contributions from people who aren’t really there.</p>

<p>The maintainers closing their gates are responding rationally. They’re protecting their time and their projects and their sanity. I don’t blame them at all.</p>

<p>But we need to think about what we’re losing. Maybe the answer is new contribution models with more structured onboarding, proof-of-understanding requirements, or tiered access based on track record. Maybe the answer is accepting that open source contribution as we knew it was a historical anomaly, enabled by specific conditions that no longer exist. Or maybe we need entirely new ways for people to build taste and prove competence before they contribute - ways that work in a world where AI can generate plausible-looking code on demand.</p>

<p>I don’t know. I’m still working it out. But watching those gates close feels like watching something important slip away. As antirez puts it: “Skipping AI is not going to help you or your career… Test these new tools, with care, with weeks of work, not in a five minutes test where you can just reinforce your own beliefs.” The fire he talks about - the passion for building - that’s still there. The question is how we pass it to the next generation when the paths we took are being walled off.</p>]]></content><author><name>Doug Campos</name></author><category term="ai" /><category term="open-source" /><category term="software" /><category term="vibe-coding" /><category term="career" /><category term="philosophy" /><summary type="html"><![CDATA[I tried to contribute a fix to beads_viewer recently. Found a real bug where a hardcoded path should have used a stored variable, wrote up a clean fix, and submitted the PR. The maintainer acknowledged the bug was legitimate, thanked me for finding it, and then hit me with something I’d never heard before: “we don’t accept outside code contributions for this project.” I was shocked. This wasn’t a rejection because my code was bad or my approach was wrong. The door was simply closed to everyone. The maintainer offered to re-implement the fix themselves, which felt strange - they’d essentially take my idea and write it again from scratch. I asked them to at least document the policy in the README so other people don’t waste their time discovering this the hard way. That experience stuck with me, and then I started seeing the pattern everywhere.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://random.qmx.me/assets/images/closing-gates-hero.png" /><media:content medium="image" url="https://random.qmx.me/assets/images/closing-gates-hero.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry></feed>