Updated 2026-05-30: corrected a few technical details (subagent isolation, run_r semantics, the return-by-handle path) and regrouped the release notes after review.

We’ve been hard at work over here at cornball.ai. Over the last couple of weeks we shipped a big new release of corteza, our R agent runtime: version 0.6.9. Lots to cover. (Its CRAN companions pensar, saber, and llm.api got new versions in the same cycle; that writeup is coming soon.)

corteza remains experimental and in active development. There aren’t many people doing agentic work away from bash in a proper REPL language, and even fewer doing it in R, and AFAIK no one else with an open source agent harness like Claude Code or Codex that’s also published on CRAN.

Somewhere in this cycle I also stumbled into a lit review. Two research groups have published fairly popular academic papers that help frame corteza with respect to building the software around LLM agents and harnesses.

corteza 0.6.9

The agent runtime got most of the attention this cycle. Lots of QOL upgrades and lots of bug fixes. Even a few contributions not from me! A big thanks to Grant McDermott and Bob Rudis for their contributions!

What it turned out to be

I started building corteza at the outbreak of the Clawdbot hype cycle, one month after I started building with Claude Code. As someone who started playing with the nnet package 20 years ago, it’s been obvious that we don’t need Python or JavaScript to build AI software. And while bash is great, it leaves a lot on the table compared to a REPL language. For one, you’ve got all that RAM just sitting there doing almost nothing! And don’t get me started on the glory of CRAN again!

Since the first corteza release, I’ve stumbled onto two research papers describing architectures similar to what I’d implemented in corteza.

CodeAct (Wang et al., UIUC; ICML 2024) was published in mid-2024, a full 18 months before Claude Code went viral in December 2025. This paper argues that an agent should act by writing and running code in a live interpreter and reading the result, rather than calling from a fixed menu of tools. That’s what run_r is. The model writes R, it runs in a persistent session, the value comes back. Almost all of corteza’s tool surface works this way: file edits, shell calls, and git are R function calls, not bespoke endpoints.

Recursive Language Models (MIT, arXiv 2512.24601) treats the context itself as a variable in a REPL. Large objects live outside the token stream and get referenced by name; the model prints a slice when it wants to look. This feels like the natural result of giving a CodeAct agent access to subroutines and subagents. A subagent returns a program variable, not regenerated text. corteza’s handle system is exactly this. A big run_r result gets stashed as .h_001, the model sees a str() summary plus the handle, and it passes that handle to the next call without ever pulling the object into its context window.

Despite the name, RLM isn’t an LLM model architecture. It’s harness architecture. The LLM writes code into a sandboxed interpreter, reads truncated outputs, and can spawn fresh LLM instances as subroutines. The recursive part is the nesting: a child can spawn its own child, which can spawn its own. Turtles all the way down. (corteza defaults cap it at depth 3). What keeps that clean is the return convention: each child hands its result back as a program variable (the paper’s FINAL_VAR(variable_name) form), not as text the parent has to read back into its context.

RLMs ask for three things, and in R they’re the same thing wearing three hats: a name bound in an environment. It holds the big object out of context, it hands the model a slice on demand, and it carries a subagent’s result back to its parent.

Input. The big object lives outside the token stream as a binding. get(), exists(), ls() are the registry operations. No bespoke handle_registry needed; R just uses <-. corteza’s .handle_store is one new.env().
Lazy read. The model never sees the object, only a str() summary and the .h_NNN handle, then pulls a slice on demand via read_handle. (R also ships actual lazy materialization, the .rdx/.rdb lazy-load DB, delayedAssign(), makeActiveBinding(), if you ever want the object itself to resolve on touch. corteza keeps it eager for now.)
Return. The parent names a return slot. After its turn, the child resolves that binding and ships the value back via callr serialization, outside the transcript. The parent stores it under a handle.

One primitive, the environment binding, covers input, lazy read, and return. That’s the whole RLM registry, for free.

In short, RLM in R = eval + environments + callr::r_session + llm.api. No bespoke registry needed.

corteza does these things by giving the LLM a live global environment that its run_r turns read and write directly, while each subagent runs in its own isolated callr::r_session and hands results back by handle. And it’s also (getting to be) a proper LLM agent harness. I didn’t set out to implement either paper. I wasn’t even aware of them when corteza first published on CRAN. So when it comes to simultaneous discovery, people used to think of Newton and Leibniz. From 2026 forward, history will remember Xingyao Wang at UIUC, Alex Zhang at MIT, and Troy Hernandez at cornball.ai, all giving LLMs real language runtimes. I’m sure of it ;)

Before stumbling into this history-making lit review, corteza already ran code two ways: run_r as the in-process, stateful REPL turn (assignments persist in globalenv()), and subagents in isolated callr::r_session children. Prior to this release, a subagent would return its answer as text (the objects it computed stayed bound in its own isolated session). The parent would then read that back into its own context.

Why it matters

There’s a lot in this release. Lots of QOL and bug fixes, but the more exciting thing for me is finding actual literature (as opposed to just tweets and YouTube videos) that helps frame, clarify, and talk about what an LLM agent in R (or Python), with access to its Global Environment, looks like. That lit also helped clean up the code: corteza was mostly RLM-shaped where it ran code before, and now it’s entirely that shape, both on its own turns and inside subagents. The rest of this release cycle was about making it feel like a real agent harness. Plan mode, subagents that return values by handle, a persistent task list, inline diffs, markdown rendering, and near-total parity between chat() and the CLI. That’s all listed below.

It’s on CRAN:

install.packages("corteza")

Still experimental, still moving fast. Source lives at cornball-ai/corteza, and bug reports or PRs are welcome (thanks again, Grant and Bob). The writeup on pensar, saber, and llm.api is coming soon.

Guys. I’m starting to think R might be the best language for AI.

Here’s the rest of the release rundown.

Plan mode and the task list

The plan_mode flag tells the LLM to research and propose rather than act. While it’s on, the policy engine denies write_file, replace_in_file, bash, run_r, and run_r_script. An exit_plan_mode tool gets injected so the model can flip the flag back off when it’s ready to do the work. /plan toggles it from chat() and the CLI. Subagents inherit the flag from their parent so a child can’t launder a write through plan mode.

The LLM also gets task_create() / task_update() tools (Claude Code’s TaskCreate pattern). The list lives on the session, is injected into each turn’s system prompt as [ ] / [>] / [x] / [-] lines, and shows up as a compact summary whenever it changes. Promoting one task to in_progress auto-demotes any other in-progress task so the one-active-at-a-time invariant holds. Persists across CLI restarts via the session record.

Approval prompt and deny semantics

The tool-approval prompt is much tighter. Reason block gone. Access collapses to one line. Choices 1 and 3 carry key hints ((Enter) and (Esc) in chat(); (Ctrl+C) in the CLI). After you answer, both surfaces print a single-line ● User replied: summary.

“3. Deny” now aborts the entire turn instead of declining a single tool call. The next turn starts with a history marker that tells the LLM to stop and check with the user. No more mashing “3” through cascades of dependent tool calls.

Interrupt key

The interrupt key (Esc in RStudio, Ctrl+C in terminal) also cleanly aborts the in-flight turn and returns you to the prompt.

Inline diffs and markdown rendering

write_file and replace_in_file now attach a unified-diff payload that renders inline as ⎿ Added N, removed M plus one row per kept line with red/green color, instead of the prior N lines in Xms summary. Diff payload capped at 200 lines / 20000 chars.

Both chat() and the CLI route assistant responses through render_md_ansi(). H1/H2/H3, **bold**, *italic*, `inline code`, fenced blocks, bullets, blockquotes, [text](url) links. Opt out with options(corteza.markdown = FALSE).

Slash commands, copy, context, and multi-line input

/status, /doctor, /config, /diff, /review, /last, /outputs all work the same in chat() and the CLI now. Token-counting helpers, default_provider_model(), and friends moved out of inst/bin/corteza into the package so both surfaces share one budget math.

/copy saves the latest assistant reply and attempts to copy it to the system clipboard via clipr or a terminal fallback.

/context (and its /status alias) renders a colored meter:

Context  24.7K / 128.0K  19%  compact 90%
[██████████..................................│.....]
  system    22.0K  89%
  tools      2.7K  11%
  history      56

Filled cells grade through normal / warn / high / crit thresholds at 75 / 90 / 95%. The auto-compact threshold sits as a │ tick at its cell position in the empty part.

/paste [optional text] collects every line verbatim until /end or Ctrl+D. Any non-slash line ending with an unescaped \ drops into bash-heredoc-with-continuation mode mid-line, seeded with what you already typed.

Shell line

! <cmd> runs a shell command locally and stages the output for the next LLM message (Claude Code / codex convention). Output capped at 4000 chars in the staged version, full on screen. /r <expr> is now available in the CLI too.

RStudio addin

Bind Ctrl+Enter to corteza_execute_in_chat() (and optionally Alt+Enter to corteza_execute_in_chat_retain()) and the addin auto-prefixes /r for .R files and ! for .sh files when chat() is the active REPL. Outside chat, no prefix is added, so it’s a pure superset of the built-in execute-line behavior. The addin expands the current line to the full top-level R statement before sending, matching what Ctrl+Enter normally does on a lm(y ~ x, opening line.

chat() and the CLI /r handlers also read continuation lines with a + prompt now, mirroring R’s REPL.

The corteza chat() startup banner: a gold brain-corn silhouette in yellow-square emoji, with version, model, and provider in the kernels.

chat() and ~/bin/corteza open with a gold brain-corn silhouette laid out in yellow-square emoji (U+1F7E8) in a brick pattern, with version, model, provider, and /help / /quit hints embedded inside the kernels. Session names are docker-style adjective_surname (boring_wozniak) instead of UUIDs.

You’ll now see ─ Worked for 3m 18s ──── at the end of each turn.

Subagents

We started with subagents. We’ll end with subagents!

Return values by handle

subagent_query(id, prompt, return_name = "out") lets a subagent hand back an R value without inlining it into prose. Tell the child to leave its result bound to a name (it needs run_r, so the work preset), pass that name as return_name, and the value rides back across the process boundary into the parent’s handle store. The reply gains a [stored as .h_NNN] block, and the parent references .h_NNN in its next run_r without the object ever entering its context. The async path captures the name at query time and stores the returned value under a parent-side handle on /collect.

That’s analogous to RLM’s FINAL_VAR(variable_name): return a value built in the REPL without embedding it in generated answer text.

The design choice here matters. The obvious path was a FINAL_VAR(x) tool the child calls to return an object. But tool calls carry JSON, not R symbols, and an object literal would leak large data into the child’s transcript. Instead, the parent names the return slot (return_name = "out"), and the child resolves it after its turn. Only the slot name appears in the request. The value ships via callr behind the scenes; the parent sees a summary and a fresh handle, not the serialized object in the transcript. With it, corteza is RLM-shaped in both places it runs code: its own turns and its subagents.

Async

subagent_query(id, prompt, wait = FALSE) fires a prompt and returns the id immediately; subagent_collect(id) drains the reply later. chat() and the CLI get /queue <id> <prompt> and /collect <id>. Each working subagent now writes an append-only JSONL transcript to disk, so compaction can rewrite the in-memory history without losing anything.

Archival

Retroactive extraction is opt-in via a new archival config block. When enabled, qualifying finished turns collapse into a fresh holder subagent that keeps the full transcript, while the parent replaces the work trace with a summary and subagent id. [Max turns reached] can trigger archival instead of remaining a dead-end string. Archival recursion is capped at a configurable depth, default 3.

Wrapping up

That’s a lot of features shipped. I probably forgot some too. But it’s an exciting project that I hope you use, enjoy, and contribute to.

The top of this post was mostly Troy; the feature list, like most of corteza, was vibe coded (err, agentically engineered). Shipped it fast, then fine-tuned while the family slept.

corteza 0.6.9: The R agent harness hits its stride