Text to Speech

Voices that Sound like India

Real-time speech synthesis for multilingual
Indian conversations.

See How it Works

Explore API

<250ms

p95 latency

10

Indian Languages

4.23

MOS score

Generate human-like speech

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Explore our Audio Library

हिन्दी

BFSI

Collections Reminder

Northern accent · BFSI

தமிழ்

Banking

Banking IVR

Chennai accent · Retail

Hi+हि

Finance

Loan Services

Urban Hinglish · Finance

తెలుగు

Telecom

Telecom IVR

Andhra accent · Activation

मराठी

NBFC

Loan Reminder

Pune accent · NBFC

বাংলা

Health

Healthcare Reminder

Kolkata accent · Healthcare

ગુજરાતી

Fintech

Payment Alert

Ahmedabad accent · Fintech

हिन्दी

BFSI

Collections Reminder

Northern accent · BFSI

தமிழ்

Banking

Banking IVR

Chennai accent · Retail

Hi+हि

Finance

Loan Services

Urban Hinglish · Finance

తెలుగు

Telecom

Telecom IVR

Andhra accent · Activation

मराठी

NBFC

Loan Reminder

Pune accent · NBFC

বাংলা

Health

Healthcare Reminder

Kolkata accent · Healthcare

ગુજરાતી

Fintech

Payment Alert

Ahmedabad accent · Fintech

India is not for

beginners

India is not for generic model providers.

6 things that break low-corpus, non-Indic voice models and how Gnani.ai handles each of them.

Challenge 1

01/06

Scroll to advance through each challenge

Text to Speech
that doesn't make you wait.

MoS · Indic Languages · Telephony environment

Gnani Timbre v2.5

4.2

Provider B - 3.9

3.9

Provider C - 3.6

3.6

Provider D - 3.8

3.8

Streaming audio chunks — first byte

init

auth

header

chunk_05

fin

chunk_02

chunk_03

chunk_04

Uptime SLA

99.9%

Uptime SLA

WebSocket streaming. First byte in <250ms.

A single API call. Streaming PCM audio in chunks. Optimized for real-time conversational pipelines.

1import websockets, asyncio, json
2async def synthesize(text, lang="hi-IN"):
3    uri = "wss://api.gnani.ai/timbre/v2/stream"
4    async with websockets.connect(uri) as ws:
5        await ws.send(json.dumps({
6            "text":     text,
7            "language": lang,
8            "voice":    "priya-v2",
9            "prosody": {
10                "style": "conversational",
11                "speed": 1.0
12            },
13            "encoding": "pcm_16000"
14        }))
15        async for chunk in ws:
16            # PCM chunks stream progressively
17            audio_buffer.write(chunk)
18asyncio.run(synthesize(
19    "Namaste Rahul ji, aapka EMI kal due hai."
20))

Copy Code

1# REST — batch synthesis
2curl -X POST https://api.gnani.ai/timbre/v2/synthesize \
3  -H "Authorization: Bearer YOUR_API_KEY" \
4  -H "Content-Type: application/json" \
5  -d '{
6    "text":     "Namaste Rahul ji, aapka EMI kal due hai.",
7    "language": "hi-IN",
8    "voice":    "priya-v2",
9    "prosody":  { "style": "conversational", "speed": 1.0 },
10    "encoding": "mp3_22050",
11    "streaming": false
12  }' \
13  --output output.mp3
14
15# X-Latency-Ms: 242
16# X-Voice-Id:   priya-v2
17# X-Language:   hi-IN

Copy Code

Response

  "stream_id": "tmb_9f4b2a81c",
  "voice": "priya-v2",
  "language": "hi-IN",
  "first_byte_ms": 156,
  "p95_latency_ms": 242,
  "encoding": "pcm_16000",
  "prosody": {
    "style": "conversational",
    "pauses": 3,
    "emphasis_points": 2
  },
  "chunks_total": 8,
  "duration_ms": 1840,
  "mos_estimate": 4.23,
  "status": "complete"
}

Explore APIs

Read Documentation

FAQs

Everything you need to know about Gnani's Voice AI platform, models, and deployment.

Which Indian languages are supported?

Gnani Timbre v2.5 supports Hindi, Tamil, Telugu, Kannada, Malayalam, Marathi, Bengali, Gujarati, Odia, and Punjabi — along with Hinglish, Tanglish, and code-switched combinations. Each is a native model, not a fine-tune on an English base.

What is p95 latency and why does it matter for IVR?

p95 is the latency at the 95th percentile. Gnani Timbre v2.5 delivers p95 under 250ms — the threshold for live IVR, real-time agent assist, and outbound voice agent pipelines where any perceptible delay breaks the conversation.

Does Gnani Timbre v2.5 support Hinglish and code-switching?

Yes. Gnani Timbre v2.5 is trained on real enterprise call center audio with natural Hinglish patterns. The prosody model handles language boundary transitions without acoustic seams or degradation at the switch point.

Can responses be streamed in real time?

Yes. The WebSocket API begins delivering PCM audio chunks before the full sentence is synthesized. Average first-byte latency is 156ms — enabling genuine real-time conversational AI pipelines.

Is Gnani Timbre v2.5 optimized for telephony and IVR channels?

Gnani Timbre v2.5 is tuned specifically for 8kHz telephony environments. MOS of 4.23 is measured on 8kHz audio, not studio conditions.

Can Gnani Timbre v2.5 be deployed on-premise?

Yes. Enterprise customers can deploy Gnani Timbre v2.5 fully on-premise for data sovereignty. Air-gapped deployments are available for BFSI and insurance organizations under RBI and IRDAI compliance requirements.