R as CLI Agent Harness

Posted by Troy Hernandez on Thu, Apr 23, 2026

One line:

1corteza::chat()

That starts an AI agent in your R console. It can read files, run shell commands, query git, search the web (with a free Tavily API key). And because it’s running in your R session, it can see your data frames, your fitted models, your loaded packages. No subprocess, no serialization boundary. It evals R code directly in .GlobalEnv.

When you want to check its work, /r df drops you back to regular R. When you’re done, /quit. The objects the agent created are still sitting in your workspace.

Works with Claude, OpenAI, Moonshot, or Ollama. chat(provider = "openai") and you’re talking to GPT-4o with the same tools.

It’s also cross-platform. The file, search, and R tools are all base R (list.files, readLines, grep, Sys.glob). The one tool that shells out swaps behind the scenes: bash on Linux and macOS, Rtools bash or cmd on Windows.

It’s also cross-modality. You can call corteza from the R console or you can call it from the command line. Matrix is now supported via our mx.api package on CRAN as of 2026/04/21.

Why R?

While having a chat loop running in your R console is novel, it’s the distribution model that’s got me excited.

Claude Code installs via npm. Codex needs Node. Most agent frameworks assume Python or TypeScript. Each comes with its own dependency management story, and that story usually involves praying that nothing in your 1,400-package dependency tree got compromised this week… which is exactly what happened 3 weeks ago when “Axios, the most popular JavaScript HTTP client library with over 100 million weekly downloads, was compromised. There was no malicious code, just a fake dependency that runs a postinstall script to deploy a remote access trojan.”

Compare that with CRAN. Packages submitted to CRAN get reviewed by humans. They’re tested on Windows, Mac, and Linux before publication. CRAN has been doing this for almost 30 years. It’s one of the most battle-tested package registries in open source.

install.packages() works the same on all three operating systems. No need for brew, nvm, Docker, or venvs. If you can install R, you can install an R package.

Curated package managers are one of computing’s great quiet achievements. CRAN solved dependency management and cross-platform compatibility long before today’s agent frameworks existed. It’s not glamorous. It just works. And “just works” is exactly what you want underneath an AI agent.

The MCP problem

MCP (Model Context Protocol) was supposed to be the universal tool interface. Connect any tool server, any model can use it. In practice, every MCP server dumps its full tool catalog into the LLM’s context window on every request.

That context costs money and burns capacity. A model with a 200k token window doesn’t benefit from spending 40k of it on tool descriptions it won’t use this turn. Worse, more tools means more ambiguity. The model gets confused about which tool to pick when 50 options look similar. Accuracy drops as the tool list grows.

The community has figured this out. Claude Code’s own skill system moved toward injecting knowledge only when relevant (system prompt snippets, not tool schemas). Cursor and Windsurf use similar patterns. The trend is away from “here are all my tools” and toward “here’s what you need right now.”

R packages fit this model naturally. You don’t load every installed package at startup. You library() the ones you need. The agent runtime can do the same: discover what’s installed, load what’s relevant, keep the context window lean.

How corteza differs from Claude Code and Codex

It’s worth being explicit about the architectural choices, because they’re the part of this story that most surprised me as I built it.

Shape. Claude Code is a compiled Node application. Codex is a compiled Rust binary. Corteza is an R package with a thin shell launcher. The first two are stand-alone programs that shell out to language runtimes when they need them. Corteza runs inside R whether you call corteza::chat() from within R’s console REPL or type corteza at the shell; the launcher just boots R and hands off to chat(). The agent loop, the tool dispatcher, and the R code the agent evaluates all live in the same interpreter.

Tool source. Claude Code and Codex ship their built-in tools (Read, Edit, Bash, Grep, Write) hardcoded in the binary. Each one is written by hand. Extensions go through MCP, which means manually assembled tool definitions exported from a separate process. Corteza takes a different route: every R function that ships with a proper .Rd file is already a machine-readable tool definition. A derivation pass turns the function’s formals() and its .Rd tree into a JSON Schema identical in shape to what Anthropic’s or OpenAI’s API expects. That’s covered in the next section.

State. Claude Code’s built-in tools run through short-lived subprocesses. Each Bash call forks a shell, runs the command, dies. Perfect for stateless filesystem + shell work, awkward when the task involves accumulating session state. Corteza goes the other way by default: the agent shares a live R session with whatever else you’re doing. Your con from DBI::dbConnect() is available. Your fitted model is in .GlobalEnv. The agent can reference them directly without re-loading. When it creates new objects, they’re still there after /quit.

Language constraints. Claude Code and Codex are general-purpose agents. Whatever language they shell out to is a guest. Corteza is R-native by design. Great fit if R is the language your workflow already lives in (data work, statistics, Matrix API glue, package development). Less interesting, but still usable, if you’re debugging a TypeScript app.

Tradeoffs come with each of these. This is a thought experiment made real, with some really interesting differences.

Tools build themselves

This is another super interesting consequence of building an agent harness in R. Every function in a properly documented (i.e. published on CRAN) R package has two machine-readable artifacts:

  • formals(fn) — the function’s argument names, defaults, and which ones are required.
  • The package’s .Rd file — title, description, @param text for every argument.

Both are enforced by R CMD check. CRAN rejects packages whose docs don’t match their signatures, whose parameter entries reference arguments that don’t exist, whose function signatures lack documentation for declared parameters. 29 years of that policy, enforced across 20,000+ packages, means every CRAN package is essentially a pre-validated tool catalog.

The derivation is mechanical. Here’s what corteza::schema_from_fn() emits for saber::pkg_help():

 1list(
 2  name = "pkg_help",
 3  description = "Read function documentation in LLM-friendly markdown.",
 4  input_schema = list(
 5    type = "object",
 6    properties = list(
 7      topic   = list(type = "string", description = "Function or symbol name."),
 8      package = list(type = "string", description = "Package name (optional).")
 9    ),
10    required = list("topic")
11  )
12)

No LLM call. No hand-written metadata. No registration beyond register_skill_from_fn("pkg_help", saber::pkg_help). The parser walks tools::Rd_db("saber"), finds the \arguments section, maps R type hints from the @param prose ((character), (integer), (logical), (character vector)) onto JSON Schema primitives, and returns a list ready for the Anthropic or OpenAI API.

The trust chain is: author writes documentation, typically roxygen2 or tinyrox does the translation, R CMD check verifies the .Rd matches the signature, CRAN publishes the binary. By the time the derivation runs, it’s consuming data that’s been typechecked at least three times.

Every package is a skill, every skill is a package. Corteza’s derivation is deterministic runtime parsing. CRAN does the filtering. The functions I’ve wanted to expose as tools so far have all been from packages that derive cleanly on the first try.

The fit test

CRAN gives you 20,000 potential skills. That doesn’t mean all 20,000 will fit.

I tested this by trying to swap corteza’s three git tools for calls into gitr, a dependency-free CRAN package wrapping system2("git", ...). The swap was a net loss. Nine lines of custom code grew to sixteen, I added a dependency, and I had to wrap gitr calls in utils::capture.output() to silence its cat-on-error prints. The code got worse.

Root cause: gitr is designed for interactive console use. It prints colored output, cats to stderr on failure, and calls oops() if you’re not in a repo. All sensible for a human at a REPL. All wrong for plumbing that feeds structured text back to an LLM. Packages that fit naturally as skills are the ones already designed as libraries. DBI returns data frames. httr returns response objects. They don’t cat to stderr. They don’t assume a human is watching.

When you’re sizing up a CRAN package as a skill, ask: would you be happy calling it from another function? If the answer is “not really, it prints stuff,” skip it and write the system2() call directly.

The ecosystem

Corteza isn’t one package. It’s a runtime that sits on top of a small constellation of single-purpose CRAN packages, each solving one thing cleanly:

  • saber — LLM-facing introspection of R packages. pkg_help() returns markdown-rendered function documentation. pkg_exports() summarizes what a package ships. In progress: schema_from_fn() to produce the JSON Schema representation described above. (The derivation logic currently lives in corteza; it migrates to saber once saber’s 30-day CRAN window reopens.)
  • llm.api — single-interface client for Anthropic, OpenAI, Moonshot, and Ollama. agent(prompt, tools, tool_handler, ...) drives the tool-use loop. Corteza treats it as the transport layer so switching providers is one argument change.
  • mx.api — minimal Matrix client-server API implementation. 19 mx_* functions covering login, sync, send, rooms, reactions. End-to-end encryption is out of scope; the Matrix bot in corteza runs against unencrypted rooms on a Synapse homeserver.

Every one of these is zero-or-one non-base dependency. Together they add up to an agent toolkit you can audit in a single evening.

What this actually looks like

The agent runtime discovers installed skill packages at startup. It reads their exports, walks the Rd trees via the derivation pipeline, builds tool schemas from the documentation, and makes them available to the LLM. When the LLM wants to call a tool, it’s a direct R function call into the current session. No JSON-RPC, no socket, no subprocess.

1# The agent calls this directly in your session.
2# Your `con` is already open — agent reuses it.
3DBI::dbGetQuery(con, "SELECT * FROM sales WHERE year = 2026")

You start with ad-hoc functions in your session. When a pattern stabilizes, you move it to a sourced script. When it’s worth sharing, you package it. Same graduation path every R user already knows.

Try it

corteza is on GitHub. All its runtime dependencies are on CRAN, so remotes::install_github pulls them in automatically:

1remotes::install_github("cornball-ai/corteza")
2library(corteza)
3chat()

corteza’s own CRAN submission is in flight. Until then, the install is from GitHub. You’ll also need an API key for your LLM provider of choice (or Ollama).

Let me be clear about the situation: this is very experimental. Moreover, Claude Code and Codex are much more refined products… especially from a token-value perspective (unless you’re using Ollama or Kimi k2.6). You’re paying a flat subscription for a massive amount of compute. Running your own agent against the raw API, you’re paying per token. For heavy coding sessions, the subscription models win by a wide margin. Corteza is currently best suited for lighter tasks or experimentation, local models via Ollama (gpt-oss:120b is genuinely usable), or situations where you want the agent inside your R session specifically.

Why are we publicizing this right now? Two reasons:

  1. Open source! ‘We’re R users’ should be enough, but the dramatic shifts in LLM quality over the last couple of months from the major providers has reinforced the notion that we should have a harness that we can inspect and control. Plus open LLM models continue to show dramatic improvement weekly.
  2. On my train ride into the city today, I wanted to read the draft version of this blog post. I forgot to push it to Github. So I messaged my agent running on my desktop asking it to push my local changes. 10 seconds later, it was done.

After that, I figured it was time.