saber | LLMs Work on Tokens, Code Lives in Trees

LLMs work on tokens. Code lives in trees.

That’s the mismatch. I’ve now built three packages that all point back to it, and I keep ending up at the same solution: the abstract syntax tree. Today we’re releasing saber, submitted to CRAN this morning.

What saber does

saber grew out of my frustration with Claude Code losing context. It doesn’t remember what functions call what across projects, or which packages depend on which. So I built a map.

The AST (along with R’s DESCRIPTION files) gives you structure at three levels:

A single function – formatting

A single package – internal structure

Your entire codebase – cross-project relationships

Formatters handle the first. Tools like astgrepr handle the second. saber handles the last two. You intuitively understand these relationships. Your AI does not!

saber uses utils::getParseData() for the AST and R’s well-defined DESCRIPTION files for cross-project dependency linking. That combination is what lets it operate at the third level, which is where most agent context tools stop short.

The two key entry points:

briefing() generates a concise markdown briefing combining DESCRIPTION metadata, downstream dependents, Claude Code memory, and recent git commits. Written to the user cache directory so both the agent and the user see the same context. An agent spins up with:

r -e 'saber::briefing()'

blast_radius() walks the call graph across all downstream projects and tells you what breaks when you change something. Before editing a shared helper:

r -e 'saber::blast_radius("my_helper")'

Give it a try and let me know what you find. I’ve been using various iterations for a while and it keeps things focused, but I haven’t done formal tests.

The same gap

There are two other recent releases in the R community worth knowing about: Emil Hvitfeldt’s debrief, which converts profvis output into text AI agents can consume, and Jon Harmon’s experimental pkgskills, which scaffolds AI-focused skill files for building R packages. I’ll come back to how they compare.

First, some context on why I keep ending up at syntax trees.

rformat – an R code formatter, published to CRAN earlier this month. Claude sometimes writes ugly code. Not wrong, just poorly formatted: weird indentation, overly spacious style. rformat uses getParseData() to extract the AST, walks the tree, and emits the code back with consistent formatting rules applied. The LLM generates tokens left to right with no inherent awareness of nesting depth or expression boundaries. The formatter reconstructs the tree and re-emits correctly. The token stream is correct but ugly; the parse tree fixes it.

I wanted a formatter with minimal dependencies that follows base R conventions. I also ended up with an unexpected bonus: an offhand comment by Claude Code about Apple’s 2014 goto fail bug led me to a harmless 17-year-old bug in the recommended R package foreign. Not bad for a formatting project.

saber (above) – context and call graph analysis for AI agents.

forthcoming – two more projects in the hopper where the AST was required. Stay tuned.

The fundamental insight

An LLM predicts the next token in a sequence. It doesn’t see the parse tree, the scope hierarchy, or the dependency graph. It sees a flat stream of tokens and predicts what comes next. This is remarkably effective. Effective enough that Claude can write correct code, pass tests, and reason about architecture. But it’s a sequential process operating on a sequential representation.

Code is not sequential. Code is a tree. x + 1 isn’t three tokens in a row: it’s a binary operation with two children. A function definition has a name, parameters, and a body, living in a library called by its reverse dependencies. That structure is represented by trees and graphs, not sequences.

When you need a machine to understand code structurally rather than sequentially, you end up at the syntax tree. Every time.

Each project addresses a different manifestation of the same mismatch:

Formatting (rformat): Generated code is syntactically correct but structurally inconsistent. The formatter reconstructs the tree and re-emits with consistent rules.

Context (saber): An LLM agent has no structural understanding between sessions and loses it during long ones. It doesn’t know that package A depends on package B, that foo() is called from three different projects, or that changing a helper will break a downstream model. saber parses every R file into its AST, extracts the call graph, and traces dependencies across projects. Without it, the agent is editing a codebase that it has to keep rediscovering.

Why this matters for AI-assisted development

ASTs are the world models of software. Yann LeCun’s next bet is that AGI requires world models – structured internal representations of how things connect, not just pattern-matched token sequences. For code, that’s the AST: it’s a practical, local world model that captures scoping, dependencies, namespaces, and call graphs. LLMs are getting better at generating correct code, but structural understanding lives outside the model. The AST is the bridge.

If you’re building packages with AI-assisted development tools, you’ll end up there too. In building llamaR, I realized agent tools could ship through the same channel as everything else in R: install.packages(). llamaR is a CLI agent harness like Claude Code, Codex, or OpenClaw. You can run it from the command line, use it as an MCP server, or – most experimentally – run the agent inside the R console itself.

The in-console loop is fun (and not very good yet), but the real point is that R packages are already tools, skills, and MCP servers in one artifact. CRAN has been doing tested, cross-platform tool distribution for 29 years. The agent ecosystem is just now discovering that’s hard.

How saber compares to debrief and pkgskills

These three packages are touching different aspects of the same problem. They’re complementary layers, not competitors.

saber answers: What functions exist? Where are they defined? Who calls this? What breaks if I change it? What packages are downstream? What should an agent know about this repo before editing?

debrief answers: What was slow? Which lines were hot? Which callers/callees dominated runtime? What source context and memory behavior matter for optimization?

pkgskills answers: Given this task, how should I approach it? What patterns should I follow? What are the right steps, conventions, and tradeoffs for this kind of change?

An agent with only saber can reason about structure but will still guess at performance and may lack execution strategy. An agent with only debrief can reason about runtime pain points but may lack broader repo context. An agent with only pkgskills can follow good general practices but operates without awareness of the specific codebase or its runtime behavior.

Structure. Performance. Process. Three angles on the same gap between sequential text generation and the structured reality of a software project. saber is the structure layer.