Overview
Synthesize speech with a single, low‑latency request. ThePOST /v1/speech endpoint returns audio bytes directly in the HTTP response—ideal for simple request/response flows, UI playback, and short prompts without job polling.
Endpoint
- Method/Path:
POST https://api.nano-gpt.com/v1/speech - Auth:
Authorization: Bearer <API_KEY> - Required header:
Content-Type: application/json - Optional header:
Acceptto prefer a MIME type (for exampleaudio/mpeg,audio/wav,audio/ogg). If omitted, the responseContent-Typefollows the requestedformatin the body.
Request Body (JSON)
model(string, required): TTS‑capable model ID (for examplenano-tts-1).input(string, required): The text to synthesize. Plain text is supported.voice(string, required): Voice preset (for exampleluna,verse,sonic). See Voices below.format(string, optional): Output container/codec. Common values:mp3,wav,ogg,opus,aac,flac,pcm16.sample_rate(number, optional): Required whenformat=pcm16(for example16000).speed(number, optional): Speaking rate multiplier (for example0.5–2.0).language(string, optional): BCP‑47 tag (for exampleen-US).
pitch(number, optional): Pitch shift or style value; model‑specific range.emotion(string, optional): Style tag (for exampleneutral,friendly,excited).stability(number, optional): 0–1; voice stability (provider‑specific).similarity(number, optional): 0–1; similarity boost (provider‑specific).
Response
- Success:
200 OK, body contains binary audio. Content-Type: Based onformat/Accept(for exampleaudio/mpeg,audio/wav,audio/ogg).
invalid_model, invalid_voice, unsupported_format, invalid_sample_rate, input_too_long, rate_limit_exceeded.
Examples
All examples write audio to disk.Notes & Limits
- Max input length: depends on model; measured in characters or tokens. For short, interactive prompts, prefer under ~1–2k characters.
- Language support: varies by model. Specify
languageto force selection; otherwise, language may be auto‑detected. - Typical latency: scales with input length and selected
format; compressed formats likemp3are often faster thanwav. - Usage metering: billed by characters/tokens or generated audio seconds (provider‑specific). See Pricing.
Audio Formats
Mapping betweenformat and response Content-Type:
| format | Content-Type | Notes |
|---|---|---|
| mp3 | audio/mpeg | Widely supported in browsers |
| wav | audio/wav | PCM; larger payloads |
| ogg | audio/ogg | Container; may include Opus |
| opus | audio/opus or audio/ogg | Streaming‑friendly |
| aac | audio/aac | Safari‑friendly |
| flac | audio/flac | Lossless |
| pcm16 | application/octet-stream | Little‑endian mono; requires sample_rate |
Voices
- Voice IDs vary by model/provider. See model‑specific voices on Text‑to‑Speech:
api-reference/text-to-speech.mdx. - If a voices listing endpoint is available (for example
GET /v1/voices), it returns available voice IDs and metadata (language coverage, gender/pitch, sample links).
Errors & Troubleshooting
invalid_model,invalid_voice,unsupported_format,invalid_sample_rate: Verifymodel,voice,format, and requiredsample_rateforpcm16.input_too_long: Reduce length; split long text into chunks and stitch audio client‑side.rate_limit_exceeded: Exponential backoff; retry after the window resets.- Network/client tips: Set
Acceptto your preferred audio type and write the raw response body directly to a file.
Security
- Do not expose API keys in browsers. Proxy via your server.
- Redact PII in logs; avoid logging raw text/audio in production.
- Rate‑limit public routes.
Pricing, Quotas, and Rate Limits
- Billing: per characters/tokens or generated seconds depending on provider/model; minimum billing increments may apply.
- Rate limits: per‑minute/day caps; contact support to request increases. See
api-reference/miscellaneous/pricing.mdxandapi-reference/miscellaneous/rate-limits.mdx.
Migration from Job‑based TTS
Already using the asyncPOST /tts + GET /tts/status flow?
- When to switch: choose
v1/speechfor short prompts, low latency, and direct playback; keep job‑based TTS for long/batch generation and webhook workflows. - Parameter mapping:
text→input,voicestaysvoice,output_format→format, speed/controls map directly when supported. - Retries/timeouts:
v1/speechreturns inline; implement client‑side timeouts and simple retries on 5xx.
Streaming
If chunked audio streaming is enabled for your account, you can request streaming with compatible formats (for example Opus/MP3) and consume the response as a stream. Example Node pattern:See Also
- Async/job‑based TTS:
api-reference/endpoint/tts.mdx - TTS Status polling:
api-reference/endpoint/tts-status.mdx