speech_clone

Generate Speech with Voice Cloning

Description

Uploads a voice sample and generates speech in that voice using the /v1/audio/speech/upload endpoint.

Usage

 1speech_clone(
 2  input,
 3  voice_file,
 4  file = NULL,
 5  backend = c("auto", "chatterbox", "qwen3"),
 6  ref_text = NULL,
 7  x_vector_only = FALSE,
 8  language = NULL,
 9  exaggeration = NULL,
10  temperature = NULL,
11  cfg_weight = NULL,
12  speed = NULL,
13  seed = NULL
14)

Arguments

  • input: Character. The text to convert to speech.
  • voice_file: Character. Path to the voice sample file (mp3, wav, etc.).
  • file: Character or NULL. Output file path. If NULL, returns raw bytes.
  • backend: Character. Backend to use: “auto” to detect qwen3 vs chatterbox, “qwen3” for Qwen3-TTS, or “chatterbox” for Chatterbox.
  • ref_text: Character or NULL. Transcript of the reference audio. Required by qwen3-tts for high-quality cloning (ICL mode). If NULL and backend requires it, use x_vector_only = TRUE for faster but lower-quality cloning.
  • x_vector_only: Logical. If TRUE, use only speaker embedding for cloning (faster but lower quality). Useful when ref_text is not available.
  • language: Character or NULL. Language for synthesis (qwen3-tts specific).
  • exaggeration: Numeric or NULL. Exaggeration parameter (Chatterbox-specific).
  • temperature: Numeric or NULL. Sampling temperature for generation.
  • cfg_weight: Numeric or NULL. CFG weight parameter (Chatterbox-specific).
  • speed: Numeric or NULL. Speed multiplier for the audio.
  • seed: Integer or NULL. Random seed for reproducible output.

Value

If file is provided, invisibly returns the file path. If file is NULL, returns raw audio bytes.

Examples

 1set_tts_base("http://localhost:7811")
 2
 3# Clone voice with transcript (high quality, qwen3-tts)
 4speech_clone(
 5  input = "Hello with my custom voice!",
 6  voice_file = "my_voice.wav",
 7  ref_text = "This is what I said in the recording.",
 8  file = "output.wav",
 9  backend = "qwen3"
10)
11
12# Clone voice without transcript (faster, lower quality)
13speech_clone(
14  input = "Hello with my custom voice!",
15  voice_file = "my_voice.wav",
16  x_vector_only = TRUE,
17  file = "output.wav"
18)
19
20# Chatterbox-style cloning
21speech_clone(
22  input = "Hello with my custom voice!",
23  voice_file = "my_voice.mp3",
24  file = "output.wav",
25  exaggeration = 0.8,
26  backend = "chatterbox"
27)