Generate Speech with Voice Cloning
Description
Uploads a voice sample and generates speech in that voice using the /v1/audio/speech/upload endpoint.
Usage
1speech_clone(
2 input,
3 voice_file,
4 file = NULL,
5 backend = c("auto", "chatterbox", "qwen3"),
6 ref_text = NULL,
7 x_vector_only = FALSE,
8 language = NULL,
9 exaggeration = NULL,
10 temperature = NULL,
11 cfg_weight = NULL,
12 speed = NULL,
13 seed = NULL
14)
Arguments
input: Character. The text to convert to speech.voice_file: Character. Path to the voice sample file (mp3, wav, etc.).file: Character or NULL. Output file path. If NULL, returns raw bytes.backend: Character. Backend to use: “auto” to detect qwen3 vs chatterbox, “qwen3” for Qwen3-TTS, or “chatterbox” for Chatterbox.ref_text: Character or NULL. Transcript of the reference audio. Required by qwen3-tts for high-quality cloning (ICL mode). If NULL and backend requires it, usex_vector_only = TRUEfor faster but lower-quality cloning.x_vector_only: Logical. If TRUE, use only speaker embedding for cloning (faster but lower quality). Useful when ref_text is not available.language: Character or NULL. Language for synthesis (qwen3-tts specific).exaggeration: Numeric or NULL. Exaggeration parameter (Chatterbox-specific).temperature: Numeric or NULL. Sampling temperature for generation.cfg_weight: Numeric or NULL. CFG weight parameter (Chatterbox-specific).speed: Numeric or NULL. Speed multiplier for the audio.seed: Integer or NULL. Random seed for reproducible output.
Value
If file is provided, invisibly returns the file path.
If file is NULL, returns raw audio bytes.
Examples
1set_tts_base("http://localhost:7811")
2
3# Clone voice with transcript (high quality, qwen3-tts)
4speech_clone(
5 input = "Hello with my custom voice!",
6 voice_file = "my_voice.wav",
7 ref_text = "This is what I said in the recording.",
8 file = "output.wav",
9 backend = "qwen3"
10)
11
12# Clone voice without transcript (faster, lower quality)
13speech_clone(
14 input = "Hello with my custom voice!",
15 voice_file = "my_voice.wav",
16 x_vector_only = TRUE,
17 file = "output.wav"
18)
19
20# Chatterbox-style cloning
21speech_clone(
22 input = "Hello with my custom voice!",
23 voice_file = "my_voice.mp3",
24 file = "output.wav",
25 exaggeration = 0.8,
26 backend = "chatterbox"
27)