
Under 4% WER for Indian Speech Recognition
14M+
35,000+
200+
Sovereign Voice AI trusted by the global ecosystem




























Why Indian ASR is fundamentally different
Generic speech recognition fails on Indian audio. Indic speech intelligence demands comprehensive training data that represents India’s conversational diversity, dialects, and depth.
10+
96%
8kHz
14M+
Outperforming on
Indic benchmarks
Get production-ready with our APIs
1import websocket
2import json
3
4# Connect to Gnani Prisma v3 WebSocket
5ws = websocket.WebSocket()
6ws.connect("wss://api.gnani.ai/prisma/v3/stream")
7
8# Send configuration
9ws.send(json.dumps({
10 "config": {
11 "language": "hi-IN",
12 "encoding": "LINEAR16",
13 "sample_rate": 8000,
14 "enable_entities": True
15 }
16}))
17
18# Stream audio chunks
19with open("audio.wav", "rb") as f:
20 while chunk := f.read(4096):
21 ws.send(chunk, opcode=0x2)
22
23# Receive transcription
24result = json.loads(ws.recv())
25print(result["transcript"])
26# Output: "mera policy renewal amount kitna hai"1# REST API - Transcribe audio file
2curl -X POST https://api.gnani.ai/prisma/v3/transcribe \
3 -H "Authorization: Bearer YOUR_API_KEY" \
4 -H "Content-Type: multipart/form-data" \
5 -F "audio=@recording.wav" \
6 -F "language=hi-IN" \
7 -F "enable_entities=true" \
8 -F "enable_diarization=true"
9# Response
10{
11 "transcript": "mera policy renewal amount kitna hai",
12 "language": "hi-IN",
13 "confidence": 0.984,
14 "latency_ms": 156,
15 "entities": [
16 {"type": "PRODUCT", "value": "policy", "start": 5, "end": 11},
17 {"type": "INTENT", "value": "renewal_query", "confidence": 0.96}
18 ],
19 "words": [
20 {"word": "mera", "start_time": 0.0, "end_time": 0.3, "confidence": 0.99},
21 {"word": "policy", "start_time": 0.3, "end_time": 0.7, "confidence": 0.98}
22 ]
23}{
"transcript": "mera policy renewal amount kitna hai",
"language": {
"detected": "hi-IN",
"confidence": 0.97,
"secondary": "en-IN"
},
"confidence": 0.984,
"latency_ms": 156,
"entities": [
{
"type": "PRODUCT",
"value": "policy",
"start": 5,
"end": 11
},
{
"type": "QUERY_TYPE",
"value": "renewal_amount",
"confidence": 0.96
}
],
"diarization": {
"speakers": 1,
"segments": [
{
"speaker": "SPEAKER_0",
"start": 0,
"end": 2.1,
"text": "mera policy renewal amount kitna hai"
}
]
}
}FAQs
Everything you need to know about Gnani's Voice AI platform, models, and deployment.
What is WER (Word Error Rate)?
Word Error Rate (WER) is the standard metric for measuring speech recognition accuracy. It calculates the percentage of words incorrectly transcribed compared to the reference transcript. Lower WER means higher accuracy. Gnani Prisma v2.5 STT achieves under 4% WER on clean Indian audio, which represents state-of-the-art performance for multilingual Indian speech.
Does Gnani Prisma v2.5 support telephony-grade audio?
Yes, Gnani Prisma v2.5 is specifically optimized for real-world telephony audio. Unlike most ASR systems trained on clean studio recordings, Gnani Prisma v2.5 is built using enterprise call center audio at 8kHz sampling rate, handling compression artifacts, background noise, and network degradation that are common in telephony environments.
How does code-switching work?
Code-switching refers to speakers mixing multiple languages within a single sentence, such as Hinglish (Hindi + English) or Tanglish (Tamil + English). Gnani Prisma v2.5's models are trained on millions of real Indian conversations with natural code-switching patterns, enabling accurate transcription without requiring speakers to stick to a single language.
Is on-premise deployment available?
Yes, Gnani Prisma v2.5 supports cloud, hybrid, and fully on-premise deployments. On-premise deployment ensures complete data sovereignty with air-gapped options for organizations with strict compliance requirements. Our enterprise team provides dedicated support for on-premise installations, including custom hardware configurations.
Which Indian languages are supported?
Gnani Prisma v2.5 supports 40+ Indian languages and dialects including Hindi, Tamil, Telugu, Bengali, Marathi, Gujarati, Kannada, Malayalam, Punjabi, Odia, Assamese, and many regional variants. We also support code-switched combinations like Hinglish, Tanglish, Benglish, and more.
Can audio be streamed in real-time?
Yes, Gnani Prisma v2.5 provides real-time streaming transcription via WebSocket connections. Audio can be streamed as it's captured, with transcription results returned with sub-500ms latency. This enables real-time use cases like live agent assistance, compliance monitoring, and instant transcription.
Does Gnani Prisma v2.5 support named entity recognition?
Yes, Gnani Prisma v2.5 includes built-in named entity recognition (NER) optimized for Indian domains. It can automatically detect and extract entities like PAN numbers, Aadhaar IDs, policy numbers, account numbers, amounts, dates, and domain-specific terms relevant to BFSI, insurance, and telecom sectors.
How does latency compare to cascaded ASR systems?
Gnani Prisma v2.5's end-to-end architecture delivers sub-500ms latency, significantly faster than traditional cascaded systems that chain multiple models together. Cascaded systems typically add 1-3 seconds of latency due to sequential processing. Gnani Prisma v2.5's single-pass inference enables real-time applications that were previously impossible.


