Last updated: 2026-01-30
An R client for Text-to-Speech APIs.
Supports multiple backends:
- OpenAI-compatible: OpenAI, Chatterbox, Qwen3-TTS, LM Studio, OpenWebUI, AnythingLLM
- ElevenLabs: Separate API with voice cloning and multilingual models
Installation
1# From CRAN (once available)
2install.packages("tts.api")
3
4# Development version
5remotes::install_github("cornball-ai/tts.api")
Backend Setup
Chatterbox (Local, OpenAI-compatible)
Use the cornball-ai/chatterbox-tts-api fork:
1git clone https://github.com/cornball-ai/chatterbox-tts-api.git
2cd chatterbox-tts-api
3
4# For newer Nvidia GPUs (Blackwell/50xx series)
5docker build -f docker/Dockerfile.blackwell -t chatterbox-tts:blackwell .
6
7docker run -d \
8 --name chatterbox-blackwell \
9 --gpus all \
10 -p 7810:4123 \
11 -v $(pwd)/cache:/cache \
12 -v $(pwd)/voices:/voices \
13 chatterbox-tts:blackwell
See upstream repo for CPU and other GPU options.
Qwen3-TTS (Local, OpenAI-compatible)
Qwen3-TTS supports voice cloning, voice design, and multilingual synthesis.
1docker run -d --gpus all --network=host --name qwen3-tts-api \
2 -v ~/.cache/huggingface:/cache \
3 -e PORT=7811 \
4 -e USE_FLASH_ATTENTION=false \
5 qwen3-tts-api:blackwell
Built-in voices: Vivian, Serena, Uncle_Fu, Dylan, Eric, Ryan, Aiden, Ono_Anna, Sohee
OpenAI
- Create an account at https://platform.openai.com
- Generate an API key at https://platform.openai.com/api-keys
- Set the environment variable
OPENAI_API_KEY
ElevenLabs
- Create an account at https://elevenlabs.io
- Get your API key from https://elevenlabs.io/app/settings/api-keys
- Set the environment variable
ELEVENLABS_API_KEY
Usage
Setup
1library(tts.api)
2
3# For local Chatterbox server (OpenAI-compatible)
4set_tts_base("http://localhost:7810")
5
6# For local Qwen3-TTS server
7set_tts_base("http://localhost:7811")
8
9# For OpenAI
10set_tts_base("https://api.openai.com")
11set_tts_key(Sys.getenv("OPENAI_API_KEY"))
12
13# For ElevenLabs (separate API key)
14set_elevenlabs_key(Sys.getenv("ELEVENLABS_API_KEY"))
Check server health
1tts_health()
2#> $ok
3#> [1] TRUE
4#>
5#> $status
6#> [1] "OK (/health)"
List available voices
1voices()
Generate speech
1# Basic usage (uses configured base URL)
2tts(
3 input = "Hello, world!",
4 voice = "alloy",
5 file = "hello.mp3"
6)
7
8# OpenAI with voice instructions
9tts(
10 input = "Today is a wonderful day to build something people love!",
11 voice = "coral",
12 file = "speech.mp3",
13 backend = "openai",
14 model = "gpt-4o-mini-tts",
15 instructions = "Speak in a cheerful and positive tone."
16)
17
18# Chatterbox with custom parameters
19tts(
20 input = "Hello with my custom voice!",
21 voice = "MyCustomVoice",
22 file = "speech.wav",
23 temperature = 0.9,
24 exaggeration = 1.2,
25 cfg_weight = 0.3
26)
27
28# ElevenLabs (different API, not OpenAI-compatible)
29tts(
30 input = "Hello from ElevenLabs!",
31 voice = "21m00Tcm4TlvDq8ikWAM", # Rachel voice ID
32 file = "hello_eleven.mp3",
33 backend = "elevenlabs",
34 stability = 0.5,
35 similarity_boost = 0.75
36)
37
38# Qwen3-TTS with built-in voice
39tts(
40 input = "Hello from Qwen3!",
41 voice = "Vivian",
42 file = "hello_qwen3.wav",
43 backend = "qwen3"
44)
45
46# Return raw bytes (useful for Shiny)
47audio_bytes <- tts(
48 input = "Hello!",
49 voice = "alloy"
50)
Voice cloning (Qwen3-TTS)
1# Fast mode (x-vector only, no transcript needed)
2speech_clone(
3 input = "Hello in my cloned voice!",
4 voice_file = "reference.wav",
5 x_vector_only = TRUE,
6 file = "cloned.wav",
7 backend = "qwen3"
8)
9
10# High quality mode (with transcript)
11speech_clone(
12 input = "Hello in my cloned voice!",
13 voice_file = "reference.wav",
14 ref_text = "This is what I said in the recording.",
15 file = "cloned.wav",
16 backend = "qwen3"
17)
Voice design (Qwen3-TTS only)
Create a custom voice from a natural language description:
1speech_design(
2 input = "Hello, I am your AI assistant.",
3 voice_description = "A warm, professional female voice with a slight British accent",
4 file = "designed_voice.wav"
5)
Voice management (Chatterbox)
Upload a voice to the library for reuse:
1# Upload once
2voice_upload(
3 voice_file = "my_voice.wav",
4 voice_name = "my-custom-voice"
5)
6
7# With language
8voice_upload(
9 voice_file = "french_voice.wav",
10 voice_name = "french-speaker",
11 language = "fr"
12)
13
14# Use the saved voice by name
15tts(
16 input = "Hello with my custom voice!",
17 voice = "my-custom-voice",
18 file = "output.wav"
19)
20
21# Or for one-off cloning (uploads and generates in one call)
22speech_clone(
23 input = "Hello with my custom voice!",
24 voice_file = "my_voice.mp3",
25 file = "output.wav",
26 exaggeration = 0.8
27)
Parameters
tts()
| Parameter | Backend | Description |
|---|---|---|
input | All | Text to convert to speech |
voice | All | Voice name or ID |
file | All | Output file path (NULL returns raw bytes) |
backend | - | “auto”, “native”, “chatterbox”, “qwen3”, “openai”, “elevenlabs”, or “fal” |
model | OpenAI, ElevenLabs | Model name |
instructions | OpenAI | Voice style instructions |
temperature | Chatterbox | Sampling temperature |
speed | OpenAI, Chatterbox | Playback speed multiplier |
exaggeration | Chatterbox | Voice exaggeration |
cfg_weight | Chatterbox | CFG weight |
stability | ElevenLabs | Voice stability (0-1) |
similarity_boost | ElevenLabs | Similarity boost (0-1) |
seed | Chatterbox | Random seed for reproducibility |
response_format | OpenAI, Chatterbox | Audio format |
voice_upload() (Chatterbox)
| Parameter | Description |
|---|---|
voice_file | Path to voice sample file |
voice_name | Name to save the voice as |
language | Language code (e.g., “en”, “fr”) |
speech_clone() (Chatterbox/Qwen3-TTS)
| Parameter | Backend | Description |
|---|---|---|
input | All | Text to convert to speech |
voice_file | All | Path to voice sample file |
file | All | Output file path (NULL returns raw bytes) |
backend | - | “auto”, “chatterbox”, or “qwen3” |
ref_text | Qwen3 | Transcript of reference audio (high quality) |
x_vector_only | Qwen3 | Use only speaker embedding (faster) |
language | Qwen3 | Language for synthesis |
exaggeration | Chatterbox | Voice exaggeration |
temperature | All | Sampling temperature |
cfg_weight | Chatterbox | CFG weight |
speed | All | Playback speed multiplier |
seed | All | Random seed for reproducibility |
speech_design() (Qwen3-TTS only)
| Parameter | Description |
|---|---|
input | Text to convert to speech |
voice_description | Natural language description of desired voice |
file | Output file path (NULL returns raw bytes) |
language | Language for synthesis (default “English”) |
Configuration functions
| Function | Purpose |
|---|---|
set_tts_base() | Set OpenAI-compatible API base URL |
set_tts_key() | Set OpenAI-compatible API key |
set_elevenlabs_key() | Set ElevenLabs API key |
Health checks
| Function | Description |
|---|---|
tts_health() | Check server health (uses configured base URL) |
chatterbox_available() | Check if Chatterbox is running on port 7810 |
qwen3_available() | Check if Qwen3-TTS is running on port 7811 |
Other functions
voices()- List available voices (OpenAI-compatible backends)languages()- List supported languages
Dependencies
curljsonlite
Reference
See Function Reference for complete API documentation.
Functions
tts.api Reference
Function reference for tts.api