tts.api

OpenAI-Compatible Text-to-Speech API Client

Last updated: 2026-01-30

An R client for Text-to-Speech APIs.

Supports multiple backends:

  • OpenAI-compatible: OpenAI, Chatterbox, Qwen3-TTS, LM Studio, OpenWebUI, AnythingLLM
  • ElevenLabs: Separate API with voice cloning and multilingual models

Installation

1# From CRAN (once available)
2install.packages("tts.api")
3
4# Development version
5remotes::install_github("cornball-ai/tts.api")

Backend Setup

Chatterbox (Local, OpenAI-compatible)

Use the cornball-ai/chatterbox-tts-api fork:

 1git clone https://github.com/cornball-ai/chatterbox-tts-api.git
 2cd chatterbox-tts-api
 3
 4# For newer Nvidia GPUs (Blackwell/50xx series)
 5docker build -f docker/Dockerfile.blackwell -t chatterbox-tts:blackwell .
 6
 7docker run -d \
 8  --name chatterbox-blackwell \
 9  --gpus all \
10  -p 7810:4123 \
11  -v $(pwd)/cache:/cache \
12  -v $(pwd)/voices:/voices \
13  chatterbox-tts:blackwell

See upstream repo for CPU and other GPU options.

Qwen3-TTS (Local, OpenAI-compatible)

Qwen3-TTS supports voice cloning, voice design, and multilingual synthesis.

1docker run -d --gpus all --network=host --name qwen3-tts-api \
2  -v ~/.cache/huggingface:/cache \
3  -e PORT=7811 \
4  -e USE_FLASH_ATTENTION=false \
5  qwen3-tts-api:blackwell

Built-in voices: Vivian, Serena, Uncle_Fu, Dylan, Eric, Ryan, Aiden, Ono_Anna, Sohee

OpenAI

  1. Create an account at https://platform.openai.com
  2. Generate an API key at https://platform.openai.com/api-keys
  3. Set the environment variable OPENAI_API_KEY

ElevenLabs

  1. Create an account at https://elevenlabs.io
  2. Get your API key from https://elevenlabs.io/app/settings/api-keys
  3. Set the environment variable ELEVENLABS_API_KEY

Usage

Setup

 1library(tts.api)
 2
 3# For local Chatterbox server (OpenAI-compatible)
 4set_tts_base("http://localhost:7810")
 5
 6# For local Qwen3-TTS server
 7set_tts_base("http://localhost:7811")
 8
 9# For OpenAI
10set_tts_base("https://api.openai.com")
11set_tts_key(Sys.getenv("OPENAI_API_KEY"))
12
13# For ElevenLabs (separate API key)
14set_elevenlabs_key(Sys.getenv("ELEVENLABS_API_KEY"))

Check server health

1tts_health()
2#> $ok
3#> [1] TRUE
4#>
5#> $status
6#> [1] "OK (/health)"

List available voices

1voices()

Generate speech

 1# Basic usage (uses configured base URL)
 2tts(
 3  input = "Hello, world!",
 4  voice = "alloy",
 5  file = "hello.mp3"
 6)
 7
 8# OpenAI with voice instructions
 9tts(
10  input = "Today is a wonderful day to build something people love!",
11  voice = "coral",
12  file = "speech.mp3",
13  backend = "openai",
14  model = "gpt-4o-mini-tts",
15  instructions = "Speak in a cheerful and positive tone."
16)
17
18# Chatterbox with custom parameters
19tts(
20  input = "Hello with my custom voice!",
21  voice = "MyCustomVoice",
22  file = "speech.wav",
23  temperature = 0.9,
24  exaggeration = 1.2,
25  cfg_weight = 0.3
26)
27
28# ElevenLabs (different API, not OpenAI-compatible)
29tts(
30  input = "Hello from ElevenLabs!",
31  voice = "21m00Tcm4TlvDq8ikWAM",  # Rachel voice ID
32  file = "hello_eleven.mp3",
33  backend = "elevenlabs",
34  stability = 0.5,
35  similarity_boost = 0.75
36)
37
38# Qwen3-TTS with built-in voice
39tts(
40  input = "Hello from Qwen3!",
41  voice = "Vivian",
42  file = "hello_qwen3.wav",
43  backend = "qwen3"
44)
45
46# Return raw bytes (useful for Shiny)
47audio_bytes <- tts(
48  input = "Hello!",
49  voice = "alloy"
50)

Voice cloning (Qwen3-TTS)

 1# Fast mode (x-vector only, no transcript needed)
 2speech_clone(
 3  input = "Hello in my cloned voice!",
 4  voice_file = "reference.wav",
 5  x_vector_only = TRUE,
 6  file = "cloned.wav",
 7  backend = "qwen3"
 8)
 9
10# High quality mode (with transcript)
11speech_clone(
12  input = "Hello in my cloned voice!",
13  voice_file = "reference.wav",
14  ref_text = "This is what I said in the recording.",
15  file = "cloned.wav",
16  backend = "qwen3"
17)

Voice design (Qwen3-TTS only)

Create a custom voice from a natural language description:

1speech_design(
2  input = "Hello, I am your AI assistant.",
3  voice_description = "A warm, professional female voice with a slight British accent",
4  file = "designed_voice.wav"
5)

Voice management (Chatterbox)

Upload a voice to the library for reuse:

 1# Upload once
 2voice_upload(
 3  voice_file = "my_voice.wav",
 4  voice_name = "my-custom-voice"
 5)
 6
 7# With language
 8voice_upload(
 9  voice_file = "french_voice.wav",
10  voice_name = "french-speaker",
11  language = "fr"
12)
13
14# Use the saved voice by name
15tts(
16  input = "Hello with my custom voice!",
17  voice = "my-custom-voice",
18  file = "output.wav"
19)
20
21# Or for one-off cloning (uploads and generates in one call)
22speech_clone(
23  input = "Hello with my custom voice!",
24  voice_file = "my_voice.mp3",
25  file = "output.wav",
26  exaggeration = 0.8
27)

Parameters

tts()

ParameterBackendDescription
inputAllText to convert to speech
voiceAllVoice name or ID
fileAllOutput file path (NULL returns raw bytes)
backend-“auto”, “native”, “chatterbox”, “qwen3”, “openai”, “elevenlabs”, or “fal”
modelOpenAI, ElevenLabsModel name
instructionsOpenAIVoice style instructions
temperatureChatterboxSampling temperature
speedOpenAI, ChatterboxPlayback speed multiplier
exaggerationChatterboxVoice exaggeration
cfg_weightChatterboxCFG weight
stabilityElevenLabsVoice stability (0-1)
similarity_boostElevenLabsSimilarity boost (0-1)
seedChatterboxRandom seed for reproducibility
response_formatOpenAI, ChatterboxAudio format

voice_upload() (Chatterbox)

ParameterDescription
voice_filePath to voice sample file
voice_nameName to save the voice as
languageLanguage code (e.g., “en”, “fr”)

speech_clone() (Chatterbox/Qwen3-TTS)

ParameterBackendDescription
inputAllText to convert to speech
voice_fileAllPath to voice sample file
fileAllOutput file path (NULL returns raw bytes)
backend-“auto”, “chatterbox”, or “qwen3”
ref_textQwen3Transcript of reference audio (high quality)
x_vector_onlyQwen3Use only speaker embedding (faster)
languageQwen3Language for synthesis
exaggerationChatterboxVoice exaggeration
temperatureAllSampling temperature
cfg_weightChatterboxCFG weight
speedAllPlayback speed multiplier
seedAllRandom seed for reproducibility

speech_design() (Qwen3-TTS only)

ParameterDescription
inputText to convert to speech
voice_descriptionNatural language description of desired voice
fileOutput file path (NULL returns raw bytes)
languageLanguage for synthesis (default “English”)

Configuration functions

FunctionPurpose
set_tts_base()Set OpenAI-compatible API base URL
set_tts_key()Set OpenAI-compatible API key
set_elevenlabs_key()Set ElevenLabs API key

Health checks

FunctionDescription
tts_health()Check server health (uses configured base URL)
chatterbox_available()Check if Chatterbox is running on port 7810
qwen3_available()Check if Qwen3-TTS is running on port 7811

Other functions

  • voices() - List available voices (OpenAI-compatible backends)
  • languages() - List supported languages

Dependencies

  • curl
  • jsonlite

Reference

See Function Reference for complete API documentation.

Functions