A Smarter Way to Measure Speech Recognition in India
Sarvam AI is rethinking how speech recognition is evaluated in India. With the launch of two open-source frameworks for Automatic Speech Recognition (ASR), the company is addressing a long-standing gap in the way AI understands multilingual and code-mixed speech across 22 Indian languages.
For years, the industry has relied on metrics like Word Error Rate (WER) and Character Error Rate (CER). While effective for English, these benchmarks often fall short in India’s complex linguistic landscape, where languages blend seamlessly, dialects vary widely, and English words are frequently woven into regional scripts.
Why Traditional Metrics Fall Short in Indic Languages
In real-world conversations across India, people rarely stick to one language or one script. A single sentence can shift between Hindi and English, or Tamil and English, without pause. Traditional metrics treat every deviation as an error- even when the meaning is perfectly clear.
This creates a mismatch between how speech recognition systems are evaluated and how people actually communicate. Minor spelling variations, colloquial expressions, or script differences can significantly impact scores, even though the intent remains unchanged.
Sarvam AI’s new approach challenges this rigidity by focusing on what truly matters: whether the system understands the speaker’s meaning.
From Exact Matching to Meaningful Understanding
At the core of these frameworks is a shift from surface-level accuracy to semantic evaluation. Instead of penalizing harmless variations, the new metrics assess whether the transcription preserves intent and key information.
Metrics like LLM-WER and LLM-CER evaluate meaning rather than exact word or character matches. Intent Score checks if the purpose of a sentence is retained, while Entity Preservation Score ensures that critical details- such as names, phone numbers, or locations- remain accurate.
This layered evaluation creates a more realistic and practical way to measure ASR performance, especially in environments where language is fluid and dynamic.
Built for India’s Linguistic Diversity
The frameworks are designed to work across 22 Indian languages, covering both Indo-Aryan and Dravidian language families. They account for everyday speech patterns like Hinglish and Tanglish, where speakers mix languages mid-sentence, as well as variations in spelling and pronunciation.
This makes the frameworks particularly relevant for real-world applications, where users expect systems to understand them regardless of how they speak. Sarvam AI’s Saaras V3 ASR platform supports these use cases, offering capabilities such as transcription, translation, and transliteration across multiple languages.

A Smarter Way to Measure Speech Recognition in India
Sarvam AI is rethinking how speech recognition is evaluated in India. With the launch of two open-source frameworks for Automatic Speech Recognition (ASR), the company is addressing a long-standing gap in the way AI understands multilingual and code-mixed speech across 22 Indian languages.
For years, the industry has relied on metrics like Word Error Rate (WER) and Character Error Rate (CER). While effective for English, these benchmarks often fall short in India’s complex linguistic landscape, where languages blend seamlessly, dialects vary widely, and English words are frequently woven into regional scripts.
Why Traditional Metrics Fall Short in Indic Languages
In real-world conversations across India, people rarely stick to one language or one script. A single sentence can shift between Hindi and English, or Tamil and English, without pause. Traditional metrics treat every deviation as an error- even when the meaning is perfectly clear.
This creates a mismatch between how speech recognition systems are evaluated and how people actually communicate. Minor spelling variations, colloquial expressions, or script differences can significantly impact scores, even though the intent remains unchanged.
Sarvam AI’s new approach challenges this rigidity by focusing on what truly matters: whether the system understands the speaker’s meaning.
From Exact Matching to Meaningful Understanding
At the core of these frameworks is a shift from surface-level accuracy to semantic evaluation. Instead of penalizing harmless variations, the new metrics assess whether the transcription preserves intent and key information.
Metrics like LLM-WER and LLM-CER evaluate meaning rather than exact word or character matches. Intent Score checks if the purpose of a sentence is retained, while Entity Preservation Score ensures that critical details- such as names, phone numbers, or locations- remain accurate.
This layered evaluation creates a more realistic and practical way to measure ASR performance, especially in environments where language is fluid and dynamic.
Built for India’s Linguistic Diversity
The frameworks are designed to work across 22 Indian languages, covering both Indo-Aryan and Dravidian language families. They account for everyday speech patterns like Hinglish and Tanglish, where speakers mix languages mid-sentence, as well as variations in spelling and pronunciation.
This makes the frameworks particularly relevant for real-world applications, where users expect systems to understand them regardless of how they speak. Sarvam AI’s Saaras V3 ASR platform supports these use cases, offering capabilities such as transcription, translation, and transliteration across multiple languages.













