stt

Speech to Text

Description

Convert an audio file to text using an OpenAI-compatible API or local audio.whisper backend.

Usage

1stt(
2  file,
3  model = NULL,
4  language = NULL,
5  response_format = c("json", "text", "verbose_json"),
6  backend = c("auto", "whisper", "audio.whisper", "openai", "fal"),
7  prompt = NULL
8)

Arguments

  • file: Path to the audio file to convert.
  • model: Model name to use for transcription. For API backends, this is passed directly (e.g., “whisper-1”). For audio.whisper, this is the model size (e.g., “tiny”, “base”, “small”, “medium”, “large”). If NULL, uses the backend’s default.
  • language: Language code (e.g., “en”, “es”, “fr”). Optional hint to improve transcription accuracy.
  • response_format: Response format for API backend. One of “text”, “json”, or “verbose_json”. Ignored for audio.whisper backend.
  • backend: Which backend to use: “auto” (default), “whisper”, “audio.whisper”, “openai”, or “fal”. Auto mode tries native whisper first, then audio.whisper, then openai API (if configured), then fal.api.
  • prompt: Optional text to guide the transcription. For API backend, this is passed as initial_prompt to help with spelling of names, acronyms, or domain-specific terms. Ignored for audio.whisper backend (not supported by underlying library).

Value

A list with components:

  • text: The transcribed text as a single string.
  • segments: A data.frame of segments with timing info, or NULL.
  • language: The detected or specified language code.
  • backend: Which backend was used (“api” or “audio.whisper”).
  • raw: The raw response from the backend.

Examples

 1# Using OpenAI API
 2set_stt_base("https://api.openai.com")
 3set_stt_key(Sys.getenv("OPENAI_API_KEY"))
 4result <- stt("speech.wav", model = "whisper-1")
 5result$text
 6
 7# Using local server
 8set_stt_base("http://localhost:4123")
 9result <- stt("speech.wav")
10
11# Using audio.whisper directly
12result <- stt("speech.wav", backend = "audio.whisper")