Generate Speech from Text
Description
Generate audio from text using various TTS backends: native R chatterbox, local Chatterbox container, OpenAI TTS API, or ElevenLabs API.
Usage
1tts(
2 input,
3 voice,
4 file = NULL,
5 backend = c("auto", "native", "chatterbox", "qwen3", "openai", "elevenlabs"),
6 model = NULL,
7 temperature = NULL,
8 speed = NULL,
9 exaggeration = NULL,
10 cfg_weight = NULL,
11 stability = NULL,
12 similarity_boost = NULL,
13 seed = NULL,
14 response_format = NULL,
15 instructions = NULL,
16 language = NULL,
17 device = "cuda"
18)
Arguments
input: Character. The text to convert to speech.voice: Character. The voice to use for synthesis. For native: path to voice reference audio file. For OpenAI: “alloy”, “echo”, “fable”, “onyx”, “nova”, “shimmer”. For ElevenLabs: voice ID (e.g., “XpDLYThV0yUAFjVTok7m”). For Chatterbox container: custom voice names uploaded via voice_upload().file: Character or NULL. Output file path. If NULL, returns raw bytes.backend: Character. Backend to use: “native” for R chatterbox package, “chatterbox” for local Chatterbox container, “qwen3” for Qwen3-TTS container, “openai” for OpenAI TTS API, “elevenlabs” for ElevenLabs API, “fal” for fal.ai TTS models, or “auto” to use native if available, else configured API base.model: Character or NULL. The model to use. For OpenAI: “tts-1” or “tts-1-hd”. For ElevenLabs: “eleven_multilingual_v2” (default), “eleven_turbo_v2_5”, etc. For Chatterbox: optional, often ignored.temperature: Numeric or NULL. Sampling temperature for generation.speed: Numeric or NULL. Speed multiplier for the audio.exaggeration: Numeric or NULL. Exaggeration parameter (Chatterbox-specific).cfg_weight: Numeric or NULL. CFG weight parameter (Chatterbox-specific).stability: Numeric or NULL. Voice stability 0-1 (ElevenLabs-specific). Default 0.5.similarity_boost: Numeric or NULL. Similarity boost 0-1 (ElevenLabs-specific). Default 0.75.seed: Integer or NULL. Random seed for reproducible output.response_format: Character or NULL. Audio format (e.g., “wav”, “mp3”). If NULL and file is provided, inferred from file extension.instructions: Character or NULL. Instructions for how the voice should speak (OpenAI/Qwen3, e.g., “Speak in a cheerful and positive tone.”).language: Character or NULL. Language for synthesis (Qwen3-specific). Options: “English”, “Chinese”, “Japanese”, “Korean”, “French”, “German”, “Spanish”, “Italian”, “Portuguese”, “Russian”.device: Character. Device for native backend: “cuda”, “cpu”, or “mps”. Default “cuda”.
Value
If file is provided, invisibly returns the file path.
If file is NULL, returns raw audio bytes (or list for native backend).
Examples
1# Using native R chatterbox package (no container needed)
2tts("Hello, world!", voice = "path/to/reference.wav",
3 file = "hello.wav", backend = "native")
4
5# Using local Chatterbox container
6set_tts_base("http://localhost:7810")
7tts("Hello, world!", voice = "FatherChristmas", file = "hello.wav")
8
9# Using OpenAI TTS
10tts("Hello, world!", voice = "nova", file = "hello.mp3", backend = "openai")
11
12# Using ElevenLabs
13tts("Hello, world!", voice = "XpDLYThV0yUAFjVTok7m",
14 file = "hello.mp3", backend = "elevenlabs")