I had no clue what a radix tree was until I started digging into SGLang, an inference server everyone claimed was “memory-efficient.” I ran it, read the docs, saw them throwing around “chunks” and “cache chunks” — nothing clicked.

Then I found mini-SGLang, the stripped-down version. And there, right in the code, was radix trees.

You know that moment when you learn something new and suddenly spot it everywhere? Like buying a specific car model and then seeing it on every street? That’s exactly what happened. Once I saw radix trees in SGLang, I started finding them in places that had absolutely nothing to do with LLMs. Turns out radix trees solve a specific class of problems so well that they keep showing up across decades and completely unrelated domains.

What Is a Radix Tree? (The Basics)

A radix tree — sometimes called a compressed trie or Patricia tree — is what you get when you take a trie and stop wasting space.

In a regular trie, each node represents one character. “CAST” and “CASH” would need eight nodes total: C, A, S, T, C, A, S, H. That’s wasteful.

A radix tree compresses single-child chains. Those two words share “CAS”, so you get one node for “CAS” and two children for “T” and “H”. Three nodes instead of eight.

Side-by-side comparison of a regular trie with eight nodes versus a compressed radix tree with three nodes

What makes them useful:

  • O(k) operations where k is key length (not O(n) where n is number of keys)
  • Space compression — single-child edges get merged
  • Natural hierarchy — prefix relationships are built in

This isn’t just academic. These are the data structures powering systems you use every day.

IP Routing: Where It All Started

The original use case: longest prefix match for packet forwarding.

Your routing table has entries like:

  • 10.0.0.0/8 → interface A
  • 10.1.0.0/16 → interface B
  • 10.1.2.0/24 → interface C

When a packet arrives for 10.1.2.5, you need the most specific match, not just any match. Hash tables can’t do this. You need a tree.

Network cards implement radix trees in hardware for line-rate performance. We’re talking about a 1960s data structure (Fredkin’s original trie paper) still winning today.

I never connected the dots before. I’ve configured routing tables, worked with network equipment, and had no idea what data structure was hiding underneath. That’s the thing about good abstractions — they work so well you forget they’re there.

Redis: The Surprise Rediscovery

But radix trees didn’t stay in networking. They migrated into places you’d never expect.

Redis uses radix trees to power Streams — its append-only log data structure introduced in Redis 5.0.

The problem Streams solve: you need an ordered, time-indexed log that supports range queries by ID and efficient consumer group tracking. Hash tables can’t give you ordered iteration. Skip lists (what sorted sets use) would work but waste memory for append-mostly workloads.

XRANGE mystream 1526985054069-0 1526985054079-0  # Give me entries in this ID range

antirez wrote Rax, a radix tree implementation, specifically for this. Stream entries are delta-compressed into listpack macro nodes, and those nodes are indexed by a radix tree keyed on entry IDs. Because IDs are time-based and stored in big-endian order, nearby entries share long prefixes — exactly what radix trees compress well.

The results are dramatic. Storing one million entries in a Stream uses ~17 MB. The same data in a sorted set + hash takes ~220 MB. That’s 13x less memory, thanks to the radix tree plus listpack combination.

I’ve used Redis in production for years. Streams were just “another data type” to me. I never looked under the hood to see that the same data structure routing my packets was also indexing my event logs.

Ethereum: The Cryptographic Twist

If Redis was the surprise, Ethereum was the plot twist.

Here’s where it gets wild. Ethereum uses a Merkle Patricia Trie — a radix tree where every node is a cryptographic hash of its children.

This isn’t just for storage. It’s for proofs.

When you want to verify that an account exists in Ethereum’s state, you don’t download the whole database. You download the path from root to leaf, and the hashes along the way. Each node’s hash is computed from its children, so if any data is tampered with, the root hash changes. The tree structure lets you prove membership without revealing everything — light clients can verify state without running a full node.

The Ethereum yellow paper (section 4.3) describes this in detail. Every account, every contract storage slot, organized as a trie.

Four consecutive blockchain state tries showing radix compression and hash chain inheritance

Same data structure that’s routing your packets and sorting your Redis keys, now securing billions in cryptocurrency. The pattern is so powerful it scales from network cards to global finance.

SGLang: LLM Inference

Now we’re back to where I started.

SGLang uses radix trees for KV cache prefix caching. Here’s the problem:

When you run LLM inference, the attention mechanism needs to keep track of all previous tokens (the KV cache). If you have 100 requests with the same system prompt, you’re computing the same KV cache 100 times. Wasteful.

SGLang stores token sequences as keys in a radix tree:

  • Key: [101, 203, 77, 55] (the system prompt tokens)
  • Value: pointer to the GPU memory holding the KV cache

When request #42 arrives with the same system prompt, the tree finds the matching path. Zero recomputation. But here’s the kicker: radix trees also handle partial overlaps. If request #43 has [101, 203, 77, 55, 88] and request #44 has [101, 203, 77, 55, 99], the tree splits after the fourth token. Both share the first four tokens, but diverge after.

For long prompts, recomputing KV cache can dominate inference time. SGLang’s prefix caching eliminates redundant work across requests.

Left-to-right flow showing documents transforming into tokens flowing into a shared radix tree trunk

Hash tables can’t do this. vLLM uses block-level hashing (16-token blocks), which means you need exact block alignment. SGLang’s token-level radix tree catches sharing opportunities that block hashing misses.

I saw this firsthand. Running SGLang with multiple concurrent requests, all sharing common prompts. The throughput jumped. GPU memory usage dropped. The radix tree was doing the heavy lifting. The RadixAttention paper has the benchmarks.

This exploration led me to organize my local model stack with Nix — models as derivations, declarative config. I documented it in a separate guide; SGLang was one of the servers I evaluated.

The Common Thread: Hierarchical Prefix Matching

So what do IP routing, Redis Streams, Ethereum state, and LLM inference have in common?

  • Hierarchical data (IP prefixes, time-based entry IDs, account states, token sequences)
  • Variable-length prefixes (not fixed blocks)
  • Need for prefix operations (longest match, range queries, membership proofs, cache sharing)

Radix trees solve all of these naturally. The structure is the solution.

This is why the pattern keeps winning. It’s not that engineers are lazy and keep copying code. Radix trees are just the right tool for this specific class of problems. When you have hierarchical data with shared prefixes, the radix tree is waiting for you.

Pattern Recognition: The Skill That Pays Dividends

The takeaway isn’t “learn radix trees.” It’s learn to recognize patterns.

I didn’t connect SGLang to IP routing. I didn’t see the connection between Redis Streams and Ethereum state. These were separate problems in my head until I learned the underlying data structure.

Pattern matching is one of the best skills you can develop in software engineering.

Bloom filters show up in distributed databases, web crawlers, and blockchain light clients. B-trees power databases, filesystems, and even Git. Skip lists are in Redis, LevelDB, and concurrency libraries.

When you see the same pattern in unrelated systems, you’re not seeing coincidence. You’re seeing the landscape of computational problems revealing its structure.

I wrote about a similar pattern with agent sessions — how sessions as trees lets you branch, backtrack, and explore without corrupting state. Same principle: the tree structure isn’t arbitrary. It’s the right tool for hierarchical, non-linear workflows. Whether it’s data structures or conversation history, the pattern repeats.

What This Means for You

I was reading about “cache chunks” in SGLang docs, completely lost. Then I found what was under the hood.

That’s the thing about patterns. Once you see them, you can’t unsee them. Next time you’re evaluating an inference server and someone mentions “prefix caching,” you’ll know to ask what data structure they’re using. Next time you’re designing a system with hierarchical data and shared prefixes, a radix tree might be waiting for you. And when you encounter a system that’s surprisingly memory-efficient or fast at range queries, you’ll know where to look.

This is what happened to me with routing tables and Redis. I used them for years without understanding what made them tick. Now I see the pattern everywhere.

The data structure you just learned, powering half the systems you use — that’s not a bug. It’s the landscape of computational problems revealing its structure.