speech to text

Under 4% WER for Indian Speech Recognition

Explore API

View Benchmark Report

14M+

Real telephony audio

35,000+

Concurrency

200+

Enterprises

Live Transcription

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Sovereign Voice AI trusted by the global ecosystem

Why Indian ASR is fundamentally different

Generic speech recognition fails on Indian audio. Indic speech intelligence demands comprehensive training data that represents India’s conversational diversity, dialects, and depth.

Transcription Comparison

Code Switching

Audio Input

Mere account mein kitna balance hai?

Gnani Output

+23% accuracy

Mere account mein kitna balance hai?

Generic ASR Output

Mere a-count main kitna balance hey?

Transcription Comparison

Regional Accents

Audio Input

I am wanting to know my policy status

Gnani Output

+18% accuracy

I am wanting to know my policy status

Generic ASR Output

I am wanting to no my poly status

Transcription Comparison

Telephony Audio Quality

Audio Input

[Noisy Call Center Audio]

Gnani Output

+31% accuracy

Main apna loan EMI date change karna chahta hoon

Generic ASR Output

Unable to transcribe... [garbled]

Transcription Comparison

India-Specific Domain Vocabulary

Audio Input

Please share your PAN and Aadhaar for KYC

Gnani Output

+27% accuracy

Please share your PAN and Aadhaar for KYC

Generic ASR Output

Please share your pan and other for KYC

10+

Languages Supported

96%

Code-Switch Accuracy

8kHz

Telephony Optimized

14M+

Training Data

Outperforming on
Indic benchmarks

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Download Full Benchmark Report

Get production-ready with our APIs

1import websocket
2import json
3
4# Connect to Gnani Prisma v3 WebSocket
5ws = websocket.WebSocket()
6ws.connect("wss://api.gnani.ai/prisma/v3/stream")
7
8# Send configuration
9ws.send(json.dumps({
10    "config": {
11        "language": "hi-IN",
12        "encoding": "LINEAR16",
13        "sample_rate": 8000,
14        "enable_entities": True
15    }
16}))
17
18# Stream audio chunks
19with open("audio.wav", "rb") as f:
20    while chunk := f.read(4096):
21        ws.send(chunk, opcode=0x2)
22        
23# Receive transcription
24result = json.loads(ws.recv())
25print(result["transcript"])
26# Output: "mera policy renewal amount kitna hai"

Copy Code

1# REST API - Transcribe audio file
2curl -X POST https://api.gnani.ai/prisma/v3/transcribe \
3  -H "Authorization: Bearer YOUR_API_KEY" \
4  -H "Content-Type: multipart/form-data" \
5  -F "audio=@recording.wav" \
6  -F "language=hi-IN" \
7  -F "enable_entities=true" \
8  -F "enable_diarization=true"
9# Response
10{
11  "transcript": "mera policy renewal amount kitna hai",
12  "language": "hi-IN",
13  "confidence": 0.984,
14  "latency_ms": 156,
15  "entities": [
16    {"type": "PRODUCT", "value": "policy", "start": 5, "end": 11},
17    {"type": "INTENT", "value": "renewal_query", "confidence": 0.96}
18  ],
19  "words": [
20    {"word": "mera", "start_time": 0.0, "end_time": 0.3, "confidence": 0.99},
21    {"word": "policy", "start_time": 0.3, "end_time": 0.7, "confidence": 0.98}
22  ]
23}

Copy Code

Response

{
  "transcript": "mera policy renewal amount kitna hai",
  "language": {
    "detected": "hi-IN",
    "confidence": 0.97,
    "secondary": "en-IN"
  },
  "confidence": 0.984,
  "latency_ms": 156,
  "entities": [
    {
      "type": "PRODUCT",
      "value": "policy",
      "start": 5,
      "end": 11
    },
    {
      "type": "QUERY_TYPE",
      "value": "renewal_amount",
      "confidence": 0.96
    }
  ],
  "diarization": {
    "speakers": 1,
    "segments": [
      {
        "speaker": "SPEAKER_0",
        "start": 0,
        "end": 2.1,
        "text": "mera policy renewal amount kitna hai"
      }
    ]
  }
}

Explore APIs

Read Documentation

FAQs

Everything you need to know about Gnani's Voice AI platform, models, and deployment.

What is WER (Word Error Rate)?

Word Error Rate (WER) is the standard metric for measuring speech recognition accuracy. It calculates the percentage of words incorrectly transcribed compared to the reference transcript. Lower WER means higher accuracy. Gnani Prisma v2.5 STT achieves under 4% WER on clean Indian audio, which represents state-of-the-art performance for multilingual Indian speech.

Does Gnani Prisma v2.5 support telephony-grade audio?

Yes, Gnani Prisma v2.5 is specifically optimized for real-world telephony audio. Unlike most ASR systems trained on clean studio recordings, Gnani Prisma v2.5 is built using enterprise call center audio at 8kHz sampling rate, handling compression artifacts, background noise, and network degradation that are common in telephony environments.

How does code-switching work?

Code-switching refers to speakers mixing multiple languages within a single sentence, such as Hinglish (Hindi + English) or Tanglish (Tamil + English). Gnani Prisma v2.5's models are trained on millions of real Indian conversations with natural code-switching patterns, enabling accurate transcription without requiring speakers to stick to a single language.

Is on-premise deployment available?

Yes, Gnani Prisma v2.5 supports cloud, hybrid, and fully on-premise deployments. On-premise deployment ensures complete data sovereignty with air-gapped options for organizations with strict compliance requirements. Our enterprise team provides dedicated support for on-premise installations, including custom hardware configurations.

Which Indian languages are supported?

Gnani Prisma v2.5 supports 40+ Indian languages and dialects including Hindi, Tamil, Telugu, Bengali, Marathi, Gujarati, Kannada, Malayalam, Punjabi, Odia, Assamese, and many regional variants. We also support code-switched combinations like Hinglish, Tanglish, Benglish, and more.

Can audio be streamed in real-time?

Yes, Gnani Prisma v2.5 provides real-time streaming transcription via WebSocket connections. Audio can be streamed as it's captured, with transcription results returned with sub-500ms latency. This enables real-time use cases like live agent assistance, compliance monitoring, and instant transcription.

Does Gnani Prisma v2.5 support named entity recognition?

Yes, Gnani Prisma v2.5 includes built-in named entity recognition (NER) optimized for Indian domains. It can automatically detect and extract entities like PAN numbers, Aadhaar IDs, policy numbers, account numbers, amounts, dates, and domain-specific terms relevant to BFSI, insurance, and telecom sectors.

How does latency compare to cascaded ASR systems?

Gnani Prisma v2.5's end-to-end architecture delivers sub-500ms latency, significantly faster than traditional cascaded systems that chain multiple models together. Cascaded systems typically add 1-3 seconds of latency due to sequential processing. Gnani Prisma v2.5's single-pass inference enables real-time applications that were previously impossible.