Last updated: 2026-01-27
stt.api is a minimal, backend-agnostic R client for OpenAI-compatible speech-to-text (STT) APIs, with optional local fallbacks.
It lets you transcribe audio in R without caring which backend actually performs the transcription.
What stt.api is (and is not)
✅ What it is
A thin R wrapper around OpenAI-style STT endpoints
A way to switch easily between:
- OpenAI
/v1/audio/transcriptions - Local OpenAI-compatible servers (LM Studio, OpenWebUI, AnythingLLM, Whisper containers)
- Local
{audio.whisper}if available
- OpenAI
Designed for scripting, Shiny apps, containers, and reproducible pipelines
❌ What it is not
- Not a Whisper reimplementation
- Not a model manager
- Not a GPU / CUDA helper
- Not an audio preprocessing toolkit
- Not a replacement for
{audio.whisper}
Installation
1# From CRAN (once available)
2install.packages("stt.api")
3
4# Development version
5remotes::install_github("cornball-ai/stt.api")
Required dependencies are minimal:
curljsonlite
Optional backends:
{audio.whisper}(local transcription){processx}(Docker helpers)
Quick start
1. Use an OpenAI-compatible API (local or cloud)
1library(stt.api)
2
3set_stt_base("http://localhost:4123")
4# Optional, for hosted services like OpenAI
5set_stt_key(Sys.getenv("OPENAI_API_KEY"))
6
7res <- stt("speech.wav")
8res$text
This works with:
- OpenAI
- Chatterbox / Whisper containers
- LM Studio
- OpenWebUI
- AnythingLLM
- Any server implementing
/v1/audio/transcriptions
2. Use local {audio.whisper} (if installed)
1res <- stt("speech.wav", backend = "audio.whisper")
2res$text
If {audio.whisper} is not installed and you request it explicitly, stt.api will error with clear instructions.
3. Automatic backend selection (default)
1res <- stt("speech.wav")
Backend priority:
- OpenAI-compatible API (if
stt.api.api_baseis set) {audio.whisper}(if installed)- Error with guidance
Normalized output
Regardless of backend, stt() always returns the same structure:
1list(
2 text = "Transcribed text",
3 segments = NULL | data.frame(...),
4 language = "en",
5 backend = "api" | "audio.whisper",
6 raw = <raw backend response>
7)
This makes it easy to switch backends without changing downstream code.
Health checks
1stt_health()
Returns:
1list(
2 ok = TRUE,
3 backend = "api",
4 message = "OK"
5)
Useful for Shiny apps and deployment checks.
Backend selection
Explicit backend choice:
1stt("speech.wav", backend = "api")
2stt("speech.wav", backend = "audio.whisper")
Automatic selection (default):
1stt("speech.wav")
Supported endpoints
stt.api targets the OpenAI-compatible STT spec:
1POST /v1/audio/transcriptions
This is intentionally chosen because it is:
- Widely adopted
- Simple
- Supported by many local and hosted services
- Easy to proxy and containerize
Docker (optional)
If you run Whisper or OpenAI-compatible STT in Docker, stt.api can optionally integrate via {processx}.
Example use cases:
- Starting a local Whisper container
- Checking container health
- Inspecting logs
Docker helpers are explicit and opt-in.
stt.api never starts containers automatically.
Configuration options
1options(
2 stt.api.api_base = NULL,
3 stt.api.api_key = NULL,
4 stt.api.timeout = 60,
5 stt.api.backend = "auto"
6)
Setters:
1set_stt_base()
2set_stt_key()
Error handling philosophy
- No silent failures
- Clear messages when a backend is unavailable
- Actionable instructions when configuration is missing
Example:
1Error in stt():
2No transcription backend available.
3Set stt.api.api_base or install audio.whisper.
Relationship to tts.api
stt.api is designed to pair cleanly with tts.api:
| Task | Package |
|---|---|
| Speech → Text | stt.api |
| Text → Speech | tts.api |
Both share:
- Minimal dependencies
- OpenAI-compatible API focus
- Backend-agnostic design
- Optional Docker support
Why this package exists
Installing and maintaining local Whisper backends can be difficult:
- CUDA / cuBLAS issues
- Compiler toolchains
- Platform differences
stt.api lets you decouple your R code from those concerns.
Your transcription code stays the same whether the backend is:
- Local
- Containerized
- Cloud-hosted
- GPU-accelerated
- CPU-only
License
MIT
Functions
clear_native_whisper_cache
stt.api::clear_native_whisper_cacheclear_whisper_cache
stt.api::clear_whisper_cacheset_stt_base
stt.api::set_stt_baseset_stt_key
stt.api::set_stt_keystt
stt.api::sttstt_health
stt.api::stt_health