tts.api

Last updated: 2026-01-30

An R client for Text-to-Speech APIs.

Supports multiple backends:

OpenAI-compatible: OpenAI, Chatterbox, Qwen3-TTS, LM Studio, OpenWebUI, AnythingLLM
ElevenLabs: Separate API with voice cloning and multilingual models

Installation

1# From CRAN (once available)
2install.packages("tts.api")
3
4# Development version
5remotes::install_github("cornball-ai/tts.api")

Backend Setup

Chatterbox (Local, OpenAI-compatible)

Use the cornball-ai/chatterbox-tts-api fork:

 1git clone https://github.com/cornball-ai/chatterbox-tts-api.git
 2cd chatterbox-tts-api
 3
 4# For newer Nvidia GPUs (Blackwell/50xx series)
 5docker build -f docker/Dockerfile.blackwell -t chatterbox-tts:blackwell .
 6
 7docker run -d \
 8  --name chatterbox-blackwell \
 9  --gpus all \
10  -p 7810:4123 \
11  -v $(pwd)/cache:/cache \
12  -v $(pwd)/voices:/voices \
13  chatterbox-tts:blackwell

See upstream repo for CPU and other GPU options.

Qwen3-TTS (Local, OpenAI-compatible)

Qwen3-TTS supports voice cloning, voice design, and multilingual synthesis.

1docker run -d --gpus all --network=host --name qwen3-tts-api \
2  -v ~/.cache/huggingface:/cache \
3  -e PORT=7811 \
4  -e USE_FLASH_ATTENTION=false \
5  qwen3-tts-api:blackwell

Built-in voices: Vivian, Serena, Uncle_Fu, Dylan, Eric, Ryan, Aiden, Ono_Anna, Sohee

OpenAI

Create an account at https://platform.openai.com
Generate an API key at https://platform.openai.com/api-keys
Set the environment variable OPENAI_API_KEY

ElevenLabs

Create an account at https://elevenlabs.io
Get your API key from https://elevenlabs.io/app/settings/api-keys
Set the environment variable ELEVENLABS_API_KEY

Usage

Setup

 1library(tts.api)
 2
 3# For local Chatterbox server (OpenAI-compatible)
 4set_tts_base("http://localhost:7810")
 5
 6# For local Qwen3-TTS server
 7set_tts_base("http://localhost:7811")
 8
 9# For OpenAI
10set_tts_base("https://api.openai.com")
11set_tts_key(Sys.getenv("OPENAI_API_KEY"))
12
13# For ElevenLabs (separate API key)
14set_elevenlabs_key(Sys.getenv("ELEVENLABS_API_KEY"))

Check server health

1tts_health()
2#> $ok
3#> [1] TRUE
4#>
5#> $status
6#> [1] "OK (/health)"

List available voices

1voices()

Generate speech

 1# Basic usage (uses configured base URL)
 2tts(
 3  input = "Hello, world!",
 4  voice = "alloy",
 5  file = "hello.mp3"
 6)
 7
 8# OpenAI with voice instructions
 9tts(
10  input = "Today is a wonderful day to build something people love!",
11  voice = "coral",
12  file = "speech.mp3",
13  backend = "openai",
14  model = "gpt-4o-mini-tts",
15  instructions = "Speak in a cheerful and positive tone."
16)
17
18# Chatterbox with custom parameters
19tts(
20  input = "Hello with my custom voice!",
21  voice = "MyCustomVoice",
22  file = "speech.wav",
23  temperature = 0.9,
24  exaggeration = 1.2,
25  cfg_weight = 0.3
26)
27
28# ElevenLabs (different API, not OpenAI-compatible)
29tts(
30  input = "Hello from ElevenLabs!",
31  voice = "21m00Tcm4TlvDq8ikWAM",  # Rachel voice ID
32  file = "hello_eleven.mp3",
33  backend = "elevenlabs",
34  stability = 0.5,
35  similarity_boost = 0.75
36)
37
38# Qwen3-TTS with built-in voice
39tts(
40  input = "Hello from Qwen3!",
41  voice = "Vivian",
42  file = "hello_qwen3.wav",
43  backend = "qwen3"
44)
45
46# Return raw bytes (useful for Shiny)
47audio_bytes <- tts(
48  input = "Hello!",
49  voice = "alloy"
50)

Voice cloning (Qwen3-TTS)

 1# Fast mode (x-vector only, no transcript needed)
 2speech_clone(
 3  input = "Hello in my cloned voice!",
 4  voice_file = "reference.wav",
 5  x_vector_only = TRUE,
 6  file = "cloned.wav",
 7  backend = "qwen3"
 8)
 9
10# High quality mode (with transcript)
11speech_clone(
12  input = "Hello in my cloned voice!",
13  voice_file = "reference.wav",
14  ref_text = "This is what I said in the recording.",
15  file = "cloned.wav",
16  backend = "qwen3"
17)

Voice design (Qwen3-TTS only)

Create a custom voice from a natural language description:

1speech_design(
2  input = "Hello, I am your AI assistant.",
3  voice_description = "A warm, professional female voice with a slight British accent",
4  file = "designed_voice.wav"
5)

Voice management (Chatterbox)

Upload a voice to the library for reuse:

 1# Upload once
 2voice_upload(
 3  voice_file = "my_voice.wav",
 4  voice_name = "my-custom-voice"
 5)
 6
 7# With language
 8voice_upload(
 9  voice_file = "french_voice.wav",
10  voice_name = "french-speaker",
11  language = "fr"
12)
13
14# Use the saved voice by name
15tts(
16  input = "Hello with my custom voice!",
17  voice = "my-custom-voice",
18  file = "output.wav"
19)
20
21# Or for one-off cloning (uploads and generates in one call)
22speech_clone(
23  input = "Hello with my custom voice!",
24  voice_file = "my_voice.mp3",
25  file = "output.wav",
26  exaggeration = 0.8
27)

Parameters

`tts()`

Parameter	Backend	Description
`input`	All	Text to convert to speech
`voice`	All	Voice name or ID
`file`	All	Output file path (NULL returns raw bytes)
`backend`	-	“auto”, “native”, “chatterbox”, “qwen3”, “openai”, “elevenlabs”, or “fal”
`model`	OpenAI, ElevenLabs	Model name
`instructions`	OpenAI	Voice style instructions
`temperature`	Chatterbox	Sampling temperature
`speed`	OpenAI, Chatterbox	Playback speed multiplier
`exaggeration`	Chatterbox	Voice exaggeration
`cfg_weight`	Chatterbox	CFG weight
`stability`	ElevenLabs	Voice stability (0-1)
`similarity_boost`	ElevenLabs	Similarity boost (0-1)
`seed`	Chatterbox	Random seed for reproducibility
`response_format`	OpenAI, Chatterbox	Audio format

`voice_upload()` (Chatterbox)

Parameter	Description
`voice_file`	Path to voice sample file
`voice_name`	Name to save the voice as
`language`	Language code (e.g., “en”, “fr”)

`speech_clone()` (Chatterbox/Qwen3-TTS)

Parameter	Backend	Description
`input`	All	Text to convert to speech
`voice_file`	All	Path to voice sample file
`file`	All	Output file path (NULL returns raw bytes)
`backend`	-	“auto”, “chatterbox”, or “qwen3”
`ref_text`	Qwen3	Transcript of reference audio (high quality)
`x_vector_only`	Qwen3	Use only speaker embedding (faster)
`language`	Qwen3	Language for synthesis
`exaggeration`	Chatterbox	Voice exaggeration
`temperature`	All	Sampling temperature
`cfg_weight`	Chatterbox	CFG weight
`speed`	All	Playback speed multiplier
`seed`	All	Random seed for reproducibility

`speech_design()` (Qwen3-TTS only)

Parameter	Description
`input`	Text to convert to speech
`voice_description`	Natural language description of desired voice
`file`	Output file path (NULL returns raw bytes)
`language`	Language for synthesis (default “English”)

Configuration functions

Function	Purpose
`set_tts_base()`	Set OpenAI-compatible API base URL
`set_tts_key()`	Set OpenAI-compatible API key
`set_elevenlabs_key()`	Set ElevenLabs API key

Health checks

Function	Description
`tts_health()`	Check server health (uses configured base URL)
`chatterbox_available()`	Check if Chatterbox is running on port 7810
`qwen3_available()`	Check if Qwen3-TTS is running on port 7811

Other functions

voices() - List available voices (OpenAI-compatible backends)
languages() - List supported languages

Dependencies

curl
jsonlite

Reference

See Function Reference for complete API documentation.

Functions

tts.api Reference
Function reference for tts.api

Packages