tts

Generate Speech from Text

Description

Generate audio from text using various TTS backends: native R chatterbox, local Chatterbox container, OpenAI TTS API, or ElevenLabs API.

Usage

 1tts(
 2  input,
 3  voice,
 4  file = NULL,
 5  backend = c("auto", "native", "chatterbox", "qwen3", "openai", "elevenlabs"),
 6  model = NULL,
 7  temperature = NULL,
 8  speed = NULL,
 9  exaggeration = NULL,
10  cfg_weight = NULL,
11  stability = NULL,
12  similarity_boost = NULL,
13  seed = NULL,
14  response_format = NULL,
15  instructions = NULL,
16  language = NULL,
17  device = "cuda"
18)

Arguments

  • input: Character. The text to convert to speech.
  • voice: Character. The voice to use for synthesis. For native: path to voice reference audio file. For OpenAI: “alloy”, “echo”, “fable”, “onyx”, “nova”, “shimmer”. For ElevenLabs: voice ID (e.g., “XpDLYThV0yUAFjVTok7m”). For Chatterbox container: custom voice names uploaded via voice_upload().
  • file: Character or NULL. Output file path. If NULL, returns raw bytes.
  • backend: Character. Backend to use: “native” for R chatterbox package, “chatterbox” for local Chatterbox container, “qwen3” for Qwen3-TTS container, “openai” for OpenAI TTS API, “elevenlabs” for ElevenLabs API, “fal” for fal.ai TTS models, or “auto” to use native if available, else configured API base.
  • model: Character or NULL. The model to use. For OpenAI: “tts-1” or “tts-1-hd”. For ElevenLabs: “eleven_multilingual_v2” (default), “eleven_turbo_v2_5”, etc. For Chatterbox: optional, often ignored.
  • temperature: Numeric or NULL. Sampling temperature for generation.
  • speed: Numeric or NULL. Speed multiplier for the audio.
  • exaggeration: Numeric or NULL. Exaggeration parameter (Chatterbox-specific).
  • cfg_weight: Numeric or NULL. CFG weight parameter (Chatterbox-specific).
  • stability: Numeric or NULL. Voice stability 0-1 (ElevenLabs-specific). Default 0.5.
  • similarity_boost: Numeric or NULL. Similarity boost 0-1 (ElevenLabs-specific). Default 0.75.
  • seed: Integer or NULL. Random seed for reproducible output.
  • response_format: Character or NULL. Audio format (e.g., “wav”, “mp3”). If NULL and file is provided, inferred from file extension.
  • instructions: Character or NULL. Instructions for how the voice should speak (OpenAI/Qwen3, e.g., “Speak in a cheerful and positive tone.”).
  • language: Character or NULL. Language for synthesis (Qwen3-specific). Options: “English”, “Chinese”, “Japanese”, “Korean”, “French”, “German”, “Spanish”, “Italian”, “Portuguese”, “Russian”.
  • device: Character. Device for native backend: “cuda”, “cpu”, or “mps”. Default “cuda”.

Value

If file is provided, invisibly returns the file path. If file is NULL, returns raw audio bytes (or list for native backend).

Examples

 1# Using native R chatterbox package (no container needed)
 2tts("Hello, world!", voice = "path/to/reference.wav",
 3       file = "hello.wav", backend = "native")
 4
 5# Using local Chatterbox container
 6set_tts_base("http://localhost:7810")
 7tts("Hello, world!", voice = "FatherChristmas", file = "hello.wav")
 8
 9# Using OpenAI TTS
10tts("Hello, world!", voice = "nova", file = "hello.mp3", backend = "openai")
11
12# Using ElevenLabs
13tts("Hello, world!", voice = "XpDLYThV0yUAFjVTok7m",
14       file = "hello.mp3", backend = "elevenlabs")