stt.api

Last updated: 2026-01-27

stt.api is a minimal, backend-agnostic R client for OpenAI-compatible speech-to-text (STT) APIs, with optional local fallbacks.

It lets you transcribe audio in R without caring which backend actually performs the transcription.

What stt.api is (and is not)

✅ What it is

A thin R wrapper around OpenAI-style STT endpoints
A way to switch easily between:
- OpenAI /v1/audio/transcriptions
- Local OpenAI-compatible servers (LM Studio, OpenWebUI, AnythingLLM, Whisper containers)
- Local {audio.whisper} if available
Designed for scripting, Shiny apps, containers, and reproducible pipelines

❌ What it is not

Not a Whisper reimplementation
Not a model manager
Not a GPU / CUDA helper
Not an audio preprocessing toolkit
Not a replacement for {audio.whisper}

Installation

1# From CRAN (once available)
2install.packages("stt.api")
3
4# Development version
5remotes::install_github("cornball-ai/stt.api")

Required dependencies are minimal:

curl
jsonlite

Optional backends:

{audio.whisper} (local transcription)
{processx} (Docker helpers)

Quick start

1. Use an OpenAI-compatible API (local or cloud)

1library(stt.api)
2
3set_stt_base("http://localhost:4123")
4# Optional, for hosted services like OpenAI
5set_stt_key(Sys.getenv("OPENAI_API_KEY"))
6
7res <- stt("speech.wav")
8res$text

This works with:

OpenAI
Chatterbox / Whisper containers
LM Studio
OpenWebUI
AnythingLLM
Any server implementing /v1/audio/transcriptions

2. Use local `{audio.whisper}` (if installed)

1res <- stt("speech.wav", backend = "audio.whisper")
2res$text

If {audio.whisper} is not installed and you request it explicitly, stt.api will error with clear instructions.

3. Automatic backend selection (default)

1res <- stt("speech.wav")

Backend priority:

OpenAI-compatible API (if stt.api.api_base is set)
{audio.whisper} (if installed)
Error with guidance

Normalized output

Regardless of backend, stt() always returns the same structure:

1list(
2  text     = "Transcribed text",
3  segments = NULL | data.frame(...),
4  language = "en",
5  backend  = "api" | "audio.whisper",
6  raw      = <raw backend response>
7)

This makes it easy to switch backends without changing downstream code.

Health checks

1stt_health()

Returns:

1list(
2  ok = TRUE,
3  backend = "api",
4  message = "OK"
5)

Useful for Shiny apps and deployment checks.

Backend selection

Explicit backend choice:

1stt("speech.wav", backend = "api")
2stt("speech.wav", backend = "audio.whisper")

Automatic selection (default):

1stt("speech.wav")

Supported endpoints

stt.api targets the OpenAI-compatible STT spec:

1POST /v1/audio/transcriptions

This is intentionally chosen because it is:

Widely adopted
Simple
Supported by many local and hosted services
Easy to proxy and containerize

Docker (optional)

If you run Whisper or OpenAI-compatible STT in Docker, stt.api can optionally integrate via {processx}.

Example use cases:

Starting a local Whisper container
Checking container health
Inspecting logs

Docker helpers are explicit and opt-in. stt.api never starts containers automatically.

Configuration options

1options(
2  stt.api.api_base = NULL,
3  stt.api.api_key  = NULL,
4  stt.api.timeout  = 60,
5  stt.api.backend  = "auto"
6)

Setters:

1set_stt_base()
2set_stt_key()

Error handling philosophy

No silent failures
Clear messages when a backend is unavailable
Actionable instructions when configuration is missing

Example:

1Error in stt():
2No transcription backend available.
3Set stt.api.api_base or install audio.whisper.

Relationship to tts.api

stt.api is designed to pair cleanly with tts.api:

Task	Package
Speech → Text	`stt.api`
Text → Speech	`tts.api`

Both share:

Minimal dependencies
OpenAI-compatible API focus
Backend-agnostic design
Optional Docker support

Why this package exists

Installing and maintaining local Whisper backends can be difficult:

CUDA / cuBLAS issues
Compiler toolchains
Platform differences

stt.api lets you decouple your R code from those concerns.

Your transcription code stays the same whether the backend is:

Local
Containerized
Cloud-hosted
GPU-accelerated
CPU-only

License

MIT

Functions

clear_native_whisper_cache
stt.api::clear_native_whisper_cache
clear_whisper_cache
stt.api::clear_whisper_cache
set_stt_base
stt.api::set_stt_base
set_stt_key
stt.api::set_stt_key
stt
stt.api::stt
stt_health
stt.api::stt_health

Packages