Sarvam AI Introduces Edge Models for On-Device AI in Indian Languages
In a significant move to democratize artificial intelligence in India, domestic startup Sarvam AI has announced the launch of Sarvam Edge, a comprehensive suite of on-device AI models. This initiative positions the company in direct competition with global giants like Google and OpenAI, specifically targeting the Indian-language AI market. Unlike the cloud-based solutions offered by these international players, Sarvam Edge operates entirely on consumer devices, eliminating the need for an internet connection.
How Sarvam Edge Works and Its Core Features
Based in Bengaluru, Sarvam AI detailed in a recent blog post that Sarvam Edge consists of compact AI models designed to run directly on consumer hardware rather than relying on remote servers. The primary goal is to bring advanced AI capabilities to users across India, including those in regions with poor or unreliable internet connectivity. The company is collaborating with global device manufacturers to integrate these models into various hardware platforms.
The speech recognition model supports 10 Indian languages within a single framework, utilizing 74 million parameters and occupying approximately 294MB of device storage. It features automatic language identification, removing the need for users to manually select the language being spoken. On a Qualcomm Snapdragon 8 Gen 3 chip, it processes speech at about 8.5 times real-time speed, with a time-to-first-token of less than 300 milliseconds. Benchmark tests on the Vistaar dataset, which includes 59 environments across domains like news and education, show that the Edge model outperforms Google Cloud STT in languages such as Hindi, Gujarati, Kannada, Punjabi, and Telugu.
Advanced Speech Synthesis and Translation Capabilities
The speech synthesis model is equally impressive, with a device footprint of about 60 MB and 24 million parameters. It supports eight speakers and ten languages within a single model, maintaining consistent voice identity across different languages. On a Samsung Galaxy S25 Ultra, it generates its first audio output in 260 milliseconds, which is roughly 5.2 times faster than real time. The model achieves a mean character error rate of 0.0173 on standard benchmarks, indicating high accuracy in matching synthesized speech to intended text. Additionally, it supports custom voice cloning, allowing new voices to be added using approximately one hour of audio data and deployed within the same 60MB model file.
For translation, the model boasts 150 million parameters and an on-device footprint of around 334MB. It handles bidirectional translation across 110 language pairs, including 10 Indian languages and English, without requiring an intermediate language step. On a Snapdragon 8 Gen 3 processor, it produces a first token in roughly 200 milliseconds and streams at about 30 tokens per second. Performance on the FloRes benchmark shows it outperforms Meta's NLLB-600M model, which is four times larger, across all tested Indian languages.
Benefits of On-Device Processing and Market Impact
A key advantage of Sarvam Edge is that all processing occurs locally on the device, ensuring no user data is transmitted to external servers. This enhances data privacy and security. Moreover, there is no per-query cost, making AI tools more accessible for applications in education, small businesses, and assistive technologies, where cloud-based pricing might otherwise be prohibitive. Sarvam AI's pitch emphasizes AI that works anywhere, is cost-effective, and keeps user data secure, potentially transforming how AI is adopted in diverse Indian contexts.
