speech to text

Under 4% WER for Indian Speech Recognition

14M+

Real telephony audio

35,000+

Concurrency

200+

Enterprises
Live Transcription
Stop
Starting
Record now
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Sovereign Voice AI trusted by the global ecosystem

Why Indian ASR is fundamentally different

Generic speech recognition fails on Indian audio. Indic speech intelligence demands comprehensive training data that represents India’s conversational diversity, dialects, and depth. 

Transcription Comparison
Code Switching
Audio Input
Mere account mein kitna balance hai?
Gnani Output
+23% accuracy
Mere account mein kitna balance hai?
Generic ASR Output
Mere a-count main kitna balance hey?
Transcription Comparison
Regional Accents
Audio Input
I am wanting to know my policy status
Gnani Output
+18% accuracy
I am wanting to know my policy status
Generic ASR Output
I am wanting to no my poly status
Transcription Comparison
Telephony Audio Quality
Audio Input
[Noisy Call Center Audio]
Gnani Output
+31% accuracy
Main apna loan EMI date change karna chahta hoon
Generic ASR Output
Unable to transcribe... [garbled]
Transcription Comparison
India-Specific Domain Vocabulary
Audio Input
Please share your PAN and Aadhaar for KYC
Gnani Output
+27% accuracy
Please share your PAN and Aadhaar for KYC
Generic ASR Output
Please share your pan and other for KYC

10+

Languages Supported

96%

Code-Switch Accuracy

8kHz

Telephony Optimized

14M+

Training Data

Outperforming on
Indic benchmarks

Filter By
Languages:
All
Datasets:
Gramvaani
Gnani Prisma v2.5
3.8%
BharatGen
3.8%
Sarvam
3.8%
Whisper Large
3.8%
Google STT
3.8%
Gnani Prisma v2.5
Other Models
Lower WER = Better Accuracy
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Get production-ready with our APIs 

1import websocket
2import json
3
4# Connect to Gnani Prisma v3 WebSocket
5ws = websocket.WebSocket()
6ws.connect("wss://api.gnani.ai/prisma/v3/stream")
7
8# Send configuration
9ws.send(json.dumps({
10    "config": {
11        "language": "hi-IN",
12        "encoding": "LINEAR16",
13        "sample_rate": 8000,
14        "enable_entities": True
15    }
16}))
17
18# Stream audio chunks
19with open("audio.wav", "rb") as f:
20    while chunk := f.read(4096):
21        ws.send(chunk, opcode=0x2)
22        
23# Receive transcription
24result = json.loads(ws.recv())
25print(result["transcript"])
26# Output: "mera policy renewal amount kitna hai"
1# REST API - Transcribe audio file
2curl -X POST https://api.gnani.ai/prisma/v3/transcribe \
3  -H "Authorization: Bearer YOUR_API_KEY" \
4  -H "Content-Type: multipart/form-data" \
5  -F "audio=@recording.wav" \
6  -F "language=hi-IN" \
7  -F "enable_entities=true" \
8  -F "enable_diarization=true"
9# Response
10{
11  "transcript": "mera policy renewal amount kitna hai",
12  "language": "hi-IN",
13  "confidence": 0.984,
14  "latency_ms": 156,
15  "entities": [
16    {"type": "PRODUCT", "value": "policy", "start": 5, "end": 11},
17    {"type": "INTENT", "value": "renewal_query", "confidence": 0.96}
18  ],
19  "words": [
20    {"word": "mera", "start_time": 0.0, "end_time": 0.3, "confidence": 0.99},
21    {"word": "policy", "start_time": 0.3, "end_time": 0.7, "confidence": 0.98}
22  ]
23}
Response
{
  "transcript": "mera policy renewal amount kitna hai",
  "language": {
    "detected": "hi-IN",
    "confidence": 0.97,
    "secondary": "en-IN"
  },
  "confidence": 0.984,
  "latency_ms": 156,
  "entities": [
    {
      "type": "PRODUCT",
      "value": "policy",
      "start": 5,
      "end": 11
    },
    {
      "type": "QUERY_TYPE",
      "value": "renewal_amount",
      "confidence": 0.96
    }
  ],
  "diarization": {
    "speakers": 1,
    "segments": [
      {
        "speaker": "SPEAKER_0",
        "start": 0,
        "end": 2.1,
        "text": "mera policy renewal amount kitna hai"
      }
    ]
  }
}

FAQs

Everything you need to know about Gnani's Voice AI platform, models, and deployment.

What is WER (Word Error Rate)?

Word Error Rate (WER) is the standard metric for measuring speech recognition accuracy. It calculates the percentage of words incorrectly transcribed compared to the reference transcript. Lower WER means higher accuracy. Gnani Prisma v2.5 STT achieves under 4% WER on clean Indian audio, which represents state-of-the-art performance for multilingual Indian speech.

Does Gnani Prisma v2.5 support telephony-grade audio?

Yes, Gnani Prisma v2.5 is specifically optimized for real-world telephony audio. Unlike most ASR systems trained on clean studio recordings, Gnani Prisma v2.5 is built using enterprise call center audio at 8kHz sampling rate, handling compression artifacts, background noise, and network degradation that are common in telephony environments.

How does code-switching work?

Code-switching refers to speakers mixing multiple languages within a single sentence, such as Hinglish (Hindi + English) or Tanglish (Tamil + English). Gnani Prisma v2.5's models are trained on millions of real Indian conversations with natural code-switching patterns, enabling accurate transcription without requiring speakers to stick to a single language.

Is on-premise deployment available?

Yes, Gnani Prisma v2.5 supports cloud, hybrid, and fully on-premise deployments. On-premise deployment ensures complete data sovereignty with air-gapped options for organizations with strict compliance requirements. Our enterprise team provides dedicated support for on-premise installations, including custom hardware configurations.

Which Indian languages are supported?

Gnani Prisma v2.5 supports 40+ Indian languages and dialects including Hindi, Tamil, Telugu, Bengali, Marathi, Gujarati, Kannada, Malayalam, Punjabi, Odia, Assamese, and many regional variants. We also support code-switched combinations like Hinglish, Tanglish, Benglish, and more.

Can audio be streamed in real-time?

Yes, Gnani Prisma v2.5 provides real-time streaming transcription via WebSocket connections. Audio can be streamed as it's captured, with transcription results returned with sub-500ms latency. This enables real-time use cases like live agent assistance, compliance monitoring, and instant transcription.

Does Gnani Prisma v2.5 support named entity recognition?

Yes, Gnani Prisma v2.5 includes built-in named entity recognition (NER) optimized for Indian domains. It can automatically detect and extract entities like PAN numbers, Aadhaar IDs, policy numbers, account numbers, amounts, dates, and domain-specific terms relevant to BFSI, insurance, and telecom sectors.

How does latency compare to cascaded ASR systems?

Gnani Prisma v2.5's end-to-end architecture delivers sub-500ms latency, significantly faster than traditional cascaded systems that chain multiple models together. Cascaded systems typically add 1-3 seconds of latency due to sequential processing. Gnani Prisma v2.5's single-pass inference enables real-time applications that were previously impossible.