rformat

Base R Code Formatter

Last updated: 2026-03-03

A code formatter for R, built on R’s parser. Formatting logic is implemented in both base R and C++ (via Rcpp) — the C++ path runs automatically and is ~85x faster; the R implementation serves as a readable reference and fallback.

rformat uses parse() and getParseData() to make formatting decisions from the token stream and expression structure, not from regex or indentation heuristics. All transforms operate on an enriched token vector (C++) or DataFrame (R).

Installation

1remotes::install_github("cornball-ai/rformat")

Usage

 1library(rformat)
 2
 3# Format a string
 4rformat("x<-1+2")
 5#> x <- 1 + 2
 6
 7# Format a file (overwrites in place)
 8rformat_file("script.R")
 9
10# Format all R files in a directory
11rformat_dir("R/")
12
13# Dry run
14rformat_file("script.R", dry_run = TRUE)

Example

1rformat("f=function(x,y){
2if(x>0)
3y=mean(x,na.rm=TRUE)
4else y=NA
5}")
1f <- function(x, y) {
2    if (x > 0)
3        y <- mean(x, na.rm = TRUE)
4    else y <- NA
5}

What it does

  • Normalizes spacing around operators, commas, and keywords
  • Indents by syntactic nesting depth
  • Converts = to <- for assignment (where the parser confirms EQ_ASSIGN, not EQ_SUB)
  • Wraps long lines at logical operators and commas
  • Wraps long function signatures with continuation indent
  • Collapses short multi-line calls back to one line
  • Preserves comments and strings exactly
  • Removes trailing whitespace and excess blank lines
  • Optionally adds braces to bare control-flow bodies
  • Optionally expands inline if-else to multi-line

Options

ParameterDefaultDescription
indent4LSpaces per level, or a string like "\t"
line_limit80LLine width before wrapping
wrap"paren""paren" aligns to (, "fixed" uses 8-space continuation
brace_style"kr""kr": ){ same line. "allman": { on its own line
control_bracesFALSEAdd braces to bare control-flow bodies
expand_ifFALSEExpand all inline if-else to multi-line
else_same_lineTRUERepair top-level }\nelse parse error
function_spaceFALSESpace before ( in function(x)
join_elseTRUEMove else to same line as }

Defaults are derived from analysis of the 30 packages that ship with R.

Correctness

Parse preservation. If input parses, output parses. Token types and ordering are preserved. Strings and comments are never modified.

Semantic preservation. Only whitespace and style tokens change. Assignment conversion and brace insertion are guided by parser token types (EQ_ASSIGN vs EQ_SUB, structural body detection), so they never change meaning.

Idempotency. rformat(rformat(x)) == rformat(x). Verified across 126 CRAN and base R packages with randomized parameter combinations (indent, wrap, brace_style, control_braces, line_limit, etc.): 0 failures, 0 idempotency exceptions.

Stress testing

The stress test suite formats every .R file from 126 packages (base, recommended, and popular CRAN), checking that formatted code parses and that formatting twice produces identical output. Tests run with randomized style parameters to exercise all option combinations.

Architecture

The formatting pipeline has two implementations that produce identical output:

  • R (R/ast_*.R): Pure base R reference implementation. No compilation needed; readable source for understanding the algorithms.
  • C++ (src/*.cpp): Rcpp fast path. Same algorithms, ~85x faster on typical files. Used automatically.

Both operate on the same token stream from parse() + getParseData(): enrich terminals with nesting depth, run transforms (collapse, wrap, braces, etc.), then serialize back to text.

License

GPL-3

Reference

See Function Reference for complete API documentation.

Functions