Overview
NanoGPT supports voice cloning so you can create reusable custom voices from short reference audio clips and then use them in text-to-speech (TTS). There are two voice-clone providers exposed via NanoGPT:- MiniMax voice clone: creates a reusable
customVoiceIdyou can pass asvoicewhen using compatible MiniMax Speech TTS models. - Qwen voice clone (1.7B): generates a speaker embedding file URL that you can pass to Qwen 3 TTS as
speaker_voice_embedding_file_url.
- Submit a clone job, receive a
runId(HTTP 202). - Poll the status endpoint until
status: "completed".
Authentication
All voice clone endpoints support:- API key auth:
x-api-key: <your NanoGPT API key>(orAuthorization: Bearer <key>) - Session auth (web app): browser cookies
Endpoints
| Provider | Submit | Status |
|---|---|---|
| MiniMax | POST /api/voice-clone/minimax | POST /api/voice-clone/minimax/status |
| Qwen | POST /api/voice-clone/qwen | POST /api/voice-clone/qwen/status |
MiniMax Voice Clone
Submit a Clone Job
multipart/form-data(upload an audio file)application/json(provideaudioUrl)
| Field | Type | Required | Notes |
|---|---|---|---|
audio | file | Yes (if no audioUrl) | MP3, M4A, WAV |
audioUrl | string | Yes (if no audio) | Hosted audio URL |
customVoiceId / custom_voice_id | string | Yes | Must match ^[A-Za-z][A-Za-z0-9]{7,}$ |
voiceCloneModel / model | string | No | Example values: speech-02-hd, speech-02-turbo |
needNoiseReduction / need_noise_reduction | boolean | No | Default false |
needVolumeNormalization / need_volume_normalization | boolean | No | Default false |
accuracy | number | No | 0 to 1, default 0.7 |
text / previewText | string | No | Preview text |
Poll Job Status
Qwen Voice Clone (1.7B)
Submit a Clone Job
multipart/form-data(upload an audio file)application/json(provideaudioUrl)
| Field | Type | Required | Notes |
|---|---|---|---|
audio | file | Yes (if no audioUrl) | MP3, OGG, WAV, M4A, AAC |
audioUrl / audio_url | string | Yes (if no audio) | Hosted audio URL |
referenceText / reference_text | string | No | Optional transcript |
Poll Job Status
X-Poll-After header indicating how many seconds to wait before polling again.
Response (completed)
Using Cloned Voices with TTS
MiniMax cloned voice (customVoiceId)
Use your customVoiceId as the normal voice on POST /api/tts with a compatible MiniMax Speech TTS model:
Qwen cloned voice (speakerEmbeddingUrl)
Use speakerEmbeddingUrl as speaker_voice_embedding_file_url on POST /api/tts with Qwen-3-TTS-1.7B:
Saving MiniMax Voice IDs (Web App)
If you use the NanoGPT web app, you can save and list your MiniMaxcustomVoiceId values.
These endpoints are session-authenticated only (they do not support API key auth).
List Saved Voice IDs
Save a Voice ID
Voice Clone Storage and Retention
Last verified: February 21, 2026. Retention depends on the provider behind each voice clone model:minimax-voice-clone(WaveSpeed + MiniMax): New cloned voice IDs are temporary. If a cloned voice is not used in a real TTS synthesis call within 7 days (168 hours), it is deleted. If it is used at least once in TTS within that window, it is kept long-term. Preview generated during clone creation does not activate or persist the voice.qwen-voice-clone(fal.ai): The returned speaker embedding file URL is hosted by fal. fal guarantees hosted generated files for at least 7 days, then they may be removed at any time. Download and store the embedding yourself immediately for long-term reuse.inworld-voice-clone(Inworld Voice API, if enabled in your workspace): Inworld does not publish a fixed auto-delete window for cloned voices in public docs. Treat cloned voices as persistent until explicitly deleted from your workspace. Note: Inworld’s Zero Data Retention mode explicitly does not apply to voice-cloning audio samples.
How to Keep and Reuse Voice Clones
MiniMax / WaveSpeed (customVoiceId)
- Save the returned voice ID (
customVoiceId; provider docs may also call thisvoice_id). - Run at least one real TTS synthesis with that voice ID within 7 days.
- Reuse the same voice ID in later TTS requests.
Qwen (speakerEmbeddingUrl)
- Save the returned
speakerEmbeddingUrl(speaker_embedding_urlin some provider docs). - Download the embedding file right away.
- Store it in your own durable storage (S3, R2, etc.).
- Use your stored URL later as
speaker_voice_embedding_file_url.
Inworld (voice_id, if enabled)
- Save the returned
voice_id. - Reuse it directly for Inworld TTS.
- If deleted from Inworld, it must be re-cloned.
Can I Download the Clone if It Gets Deleted?
- MiniMax / WaveSpeed: no portable voice embedding download is documented; keep the voice ID active by using it in time.
- Qwen: yes, download the speaker embedding file from
speakerEmbeddingUrl/speaker_embedding_url. - Inworld: no documented voice-embedding export endpoint; keep the
voice_idand avoid accidental deletion.
Warning: Provider retention policies may change. This page reflects provider docs as of February 21, 2026.
Provider Source Links
- MiniMax voice cloning intro: https://platform.minimax.io/docs/api-reference/voice-cloning-intro
- MiniMax voice clone endpoint: https://platform.minimax.io/docs/api-reference/voice-cloning-clone
- MiniMax FAQ (voice ID validity, activation, preview behavior): https://platform.minimax.io/docs/faq/about-apis
- WaveSpeed MiniMax voice clone persistence notes: https://wavespeed.ai/docs/docs-api/minimax/minimax-voice-clone
- fal FAQ (file retention): https://fal-d8505a2e.mintlify.app/model-apis/faq
- fal Queue API (
X-Fal-Object-Lifecycle-Preference): https://fal-d8505a2e.mintlify.app/model-apis/mndpoints/queue - Inworld clone voice API: https://docs.inworld.ai/api-reference/voiceAPI/voiceservice/clone-voice
- Inworld list voices: https://docs.inworld.ai/api-reference/voiceAPI/voiceservice/list-voices
- Inworld delete voice: https://docs.inworld.ai/api-reference/voiceAPI/voiceservice/delete-voice
- Inworld zero data retention (voice cloning samples excluded): https://docs.inworld.ai/docs/tts/zero-data-retention
Pricing
Clone runs are charged as a flat per-run fee:- MiniMax voice clone: $1.00 per run
- Qwen voice clone (1.7B): $0.25 per run
cost and paymentSource for the run.
Limitations
- MiniMax and Qwen clone endpoints are asynchronous; clients must poll status until completion.
- MiniMax
customVoiceIdmust match^[A-Za-z][A-Za-z0-9]{7,}$.