Sarvam AI Unveils Bulbul-v2: India-Focused Text-to-Speech Model Supporting 11 Languages

Bengaluru-based AI startup Sarvam AI has announced the release of its latest innovation—Bulbul-v2, a powerful text-to-speech (TTS) model tailored specifically for India. Supporting 11 Indian languages, the model is designed to deliver speech with authentic regional accents that the company describes as sounding “just like India.”

In a recent post on LinkedIn, Sarvam AI emphasized that Bulbul-v2 generates lifelike, expressive audio, avoiding the flat or robotic tone common in many TTS systems. It also boasts high-speed processing, customizable voice options, and is particularly suited for use by brands and enterprises looking to localize content at scale.

According to Sarvam, Bulbul-v2 represents a leap forward for speech AI in India, setting new standards in terms of naturalness, responsiveness, and affordability. As part of its broader mission to democratize access to AI in India, the startup is offering low-latency API access at India-friendly pricing, helping expand the technology’s reach across industries.

Notably, Sarvam AI is the first Indian startup selected by the central government to develop India’s sovereign large language model (LLM) under the national IndiaAI initiative, which aims to build indigenous capabilities in artificial intelligence.

What is Bulbul-v2?

Bulbul-v2 is Sarvam AI’s flagship TTS model, engineered to mirror India’s linguistic diversity and speech patterns. It supports real-time synthesis, multi-language inputs, and code-mixed text, making it adept at handling natural conversations across different Indian languages. The model also includes multiple voice personas, giving users creative flexibility.

Key capabilities include:

  • Realistic voice prosody (rhythm, tone, and intonation)
  • Voice customization (adjust pitch, speed, and volume)
  • Language-aware text processing, including smart handling of numbers, dates, and mixed-language sentences
  • Sample rate options ranging from 8kHz to 24kHz for adaptable audio quality

What Can Bulbul-v2 Do?

The model can instantly convert text into natural speech using preset or custom configurations. Users have fine-grained control over audio output, allowing them to tailor speech style to specific use cases—be it customer service, storytelling, or content localization.

The integrated text preprocessing system intelligently normalizes text inputs to enhance clarity and pronunciation, especially for numerical or hybrid linguistic inputs.

Released as a follow-up to Bulbul-v1, which launched in August 2024 with six voice presets, Bulbul-v2 pushes the boundaries with more nuanced voice personalities and greater scalability.

Given its speed, affordability, and Indian linguistic orientation, Bulbul-v2 is being positioned as a competitive alternative to global TTS models, especially for developers, educators, and businesses aiming for localized engagement.

- Advertisement -

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles

error: Content is protected !!

Sign Up for CXO Digital Pulse Newsletters

Sign Up for CXO Digital Pulse Newsletters to Download the Research Report

Sign Up for CXO Digital Pulse Newsletters to Download the Coffee Table Book

Sign Up for CXO Digital Pulse Newsletters to Download the Vision 2023 Research Report

Download 8 Key Insights for Manufacturing for 2023 Report

Sign Up for CISO Handbook 2023

Download India’s Cybersecurity Outlook 2023 Report

Unlock Exclusive Insights: Access the article

Download CIO VISION 2024 Report

Share your details to download the report

Share your details to download the CISO Handbook 2024

Fill your details to Watch