
Voices that Sound like India
Real-time speech synthesis for multilingual
Indian conversations.
<250ms
10
4.23
Explore our Audio Library














India is not for beginners
India is not for generic model providers.
6 things that break low-corpus, non-Indic voice models and how Gnani.ai handles each of them.






Text to Speech
that doesn't make you wait.
WebSocket streaming. First byte in <250ms.
A single API call. Streaming PCM audio in chunks. Optimized for real-time conversational pipelines.
1import websockets, asyncio, json
2async def synthesize(text, lang="hi-IN"):
3 uri = "wss://api.gnani.ai/timbre/v2/stream"
4 async with websockets.connect(uri) as ws:
5 await ws.send(json.dumps({
6 "text": text,
7 "language": lang,
8 "voice": "priya-v2",
9 "prosody": {
10 "style": "conversational",
11 "speed": 1.0
12 },
13 "encoding": "pcm_16000"
14 }))
15 async for chunk in ws:
16 # PCM chunks stream progressively
17 audio_buffer.write(chunk)
18asyncio.run(synthesize(
19 "Namaste Rahul ji, aapka EMI kal due hai."
20))1# REST — batch synthesis
2curl -X POST https://api.gnani.ai/timbre/v2/synthesize \
3 -H "Authorization: Bearer YOUR_API_KEY" \
4 -H "Content-Type: application/json" \
5 -d '{
6 "text": "Namaste Rahul ji, aapka EMI kal due hai.",
7 "language": "hi-IN",
8 "voice": "priya-v2",
9 "prosody": { "style": "conversational", "speed": 1.0 },
10 "encoding": "mp3_22050",
11 "streaming": false
12 }' \
13 --output output.mp3
14
15# X-Latency-Ms: 242
16# X-Voice-Id: priya-v2
17# X-Language: hi-IN "stream_id": "tmb_9f4b2a81c",
"voice": "priya-v2",
"language": "hi-IN",
"first_byte_ms": 156,
"p95_latency_ms": 242,
"encoding": "pcm_16000",
"prosody": {
"style": "conversational",
"pauses": 3,
"emphasis_points": 2
},
"chunks_total": 8,
"duration_ms": 1840,
"mos_estimate": 4.23,
"status": "complete"
}FAQs
Everything you need to know about Gnani's Voice AI platform, models, and deployment.
Which Indian languages are supported?
Gnani Timbre v2.5 supports Hindi, Tamil, Telugu, Kannada, Malayalam, Marathi, Bengali, Gujarati, Odia, and Punjabi — along with Hinglish, Tanglish, and code-switched combinations. Each is a native model, not a fine-tune on an English base.
What is p95 latency and why does it matter for IVR?
p95 is the latency at the 95th percentile. Gnani Timbre v2.5 delivers p95 under 250ms — the threshold for live IVR, real-time agent assist, and outbound voice agent pipelines where any perceptible delay breaks the conversation.
Does Gnani Timbre v2.5 support Hinglish and code-switching?
Yes. Gnani Timbre v2.5 is trained on real enterprise call center audio with natural Hinglish patterns. The prosody model handles language boundary transitions without acoustic seams or degradation at the switch point.
Can responses be streamed in real time?
Yes. The WebSocket API begins delivering PCM audio chunks before the full sentence is synthesized. Average first-byte latency is 156ms — enabling genuine real-time conversational AI pipelines.
Is Gnani Timbre v2.5 optimized for telephony and IVR channels?
Gnani Timbre v2.5 is tuned specifically for 8kHz telephony environments. MOS of 4.23 is measured on 8kHz audio, not studio conditions.
Can Gnani Timbre v2.5 be deployed on-premise?
Yes. Enterprise customers can deploy Gnani Timbre v2.5 fully on-premise for data sovereignty. Air-gapped deployments are available for BFSI and insurance organizations under RBI and IRDAI compliance requirements.


