New Delhi: A Bengaluru-based startup is quietly establishing itself as a benchmark in terms of the measurement of success in a global artificial intelligence race that is dominated by US giants. In a series of India-based benchmarks that assess specifics of document and speech functionality, and not general conversational functionality, Sarvam AI, established in 2023, has been shown to be doing better than Google’s Gemini and the ChatGPT work of OpenAI.
The findings highlight an increasingly common opinion in the AI community: larger models do not necessarily work better. With specially trained systems that can handle Indian languages, documents and usage patterns, Sarvam AI has shown that locally trained models can be superior to global platforms in areas that are of the most concern to governments and companies in India.
A different approach to artificial intelligence
Sarvam AI is not pursuing a general-purpose chatbot that other international companies like Google and OpenAI are in pursuit of. It is working on what it terms ‘sovereign AI’, systems based upon the bottom-up on Indian languages, scripts, accents, and infrastructure realities.
The vast majority of international paradigms are trained mostly on clean, English-heavy online data. In India, documents are, however, scanned, handwritten, multilingual and not formatted properly. Hindi is often mixed with English and regional languages. Sarvam AI does not begin with this complexity as a positive edge case.
Sarvam Vision and document intelligence
The Sarvam Vision document understanding system of Sarvam AI is aimed at the banking, logistics, and public administration areas, where paper records are still very frequent. Gemini 3 Pro and GPT-4o were beaten by Sarvam Vision, which had an accuracy of 84.3 per cent in the olmOCR-Bench.
Sarvam Vision passed 93 per cent accuracy on OmniDocBench v1.5, which tests layout, table and structured form recognition by the model. These standards are modelled to indicate actual paperwork and not the ideal digital records, which is particularly applicable to the Indian workflows.
Bulbul V3 and Indian speech systems
Another product of the company is the Bulbul V3, which is a text-to-speech system designed to operate on the Indian languages and telephony. Indian accents, local names and code-mixed sentences can be a problem with global speech systems. Bulbul V3 is optimised, including 11 Indian languages, and has low bandwidth support.
Sarvam AI claims internal blind listening tests indicate that Bulbul V3 causes fewer pronunciation and comprehension errors during an Indian speech situation, especially in call centre and IVR applications.
Smaller models, lower costs
In general, Sarvam AI uses 2 to 3 billion parameters to construct its core language models, which are quite small compared to the world systems leaders. This ensures reduced costs and better response times and can be deployed using local infrastructure.
Specially created tokenisers also lower the processing cost of Indian scripts, which can be more expensive to process on a global platform primarily designed to support the English language.
Sarvam AI has been funded by Lightspeed Venture Partners, Peak XV Partners and Khosla Ventures in the Series A round of financing to the tune of 41 million dollars. It has also been chosen as part of the IndiaAI Mission to assist in developing an indigenous AI framework in India, for which access to large-scale GPS infrastructure is provided.
The emergence of Sarvam AI is an indicator of a changed AI leadership definition. Rather than competing based on size alone, the company is prevailing based on local data, depth in language, and use requirements. In the case of Indian government services, enterprises, and consumers, this specific strategy can be more efficient compared to the use of imported AI systems only.