
Enterprise artificial intelligence company Cohere has launched its first voice model, named Transcribe, marking its entry into the fast-growing speech recognition and transcription market. Announced in March 2026, the model is designed specifically for automatic speech recognition tasks such as note-taking, transcription, and speech analysis.
Transcribe is an open-source model, allowing developers and organizations to access, modify, and deploy it according to their needs. Unlike many existing solutions that rely heavily on cloud-based infrastructure, Cohere’s model is optimized to run on consumer-grade GPUs, enabling companies to host the system on their own hardware. This approach is expected to appeal particularly to enterprises that prioritize data privacy and control over sensitive audio information.
The model supports 14 languages and is built to handle a wide range of real-world use cases, from meeting transcription and customer service analytics to legal documentation and healthcare records. With a relatively lightweight architecture of around 2 billion parameters, Transcribe aims to balance performance and efficiency, making advanced speech recognition capabilities more accessible to smaller organizations and developers.
Cohere claims that the model delivers strong accuracy compared to other open-source alternatives, achieving competitive results on industry benchmarks. Early reports suggest that Transcribe could lower the cost of deploying speech-to-text systems while improving flexibility, as users are not locked into proprietary cloud ecosystems.
The launch positions Cohere in direct competition with established players in the speech recognition space, including offerings from major technology companies such as Google and OpenAI. As demand for voice-based AI solutions continues to rise across industries, the introduction of an open-source alternative reflects a broader industry shift toward more customizable and cost-efficient AI deployments.
With Transcribe, Cohere is expanding beyond its core focus on text-based AI models into multimodal capabilities, signalling its ambition to become a more comprehensive enterprise AI provider. The move also highlights the growing importance of voice data in shaping the next phase of artificial intelligence innovation.




