Mistral has introduced Voxtral, a new family of state-of-the-art speech understanding AI models. The Voxtral lineup includes a large 24-billion parameter model for production-scale applications and a lightweight 3-billion parameter model designed for local and edge deployments, allowing deployment at different scales.
In addition to their flexible deployment options, both Voxtral models are released under the Apache 2.0 license and are available through Mistral's API. Building on this accessibility, Mistral offers a highly optimized transcription-only endpoint for enhanced cost-efficiency, and the models can be downloaded from Hugging Face for independent use.
Beyond standard transcription, Voxtral delivers advanced features such as support for long-form context, built-in question answering, native summarization, multilingual processing, and function-calling directly from voice input. These extensive capabilities support real-world interactions and downstream actions, including generating summaries, answering questions, performing analysis, and extracting insights.
For organizations emphasizing budget, the Voxtral Mini Transcribe model outperforms OpenAI Whisper at under half the cost. Meanwhile, Voxtral Small delivers performance comparable to ElevenLabs Scribe, also for less than half the price. Voxtral can also be tested via Mistral Le Chat’s voice mode on both web and mobile platforms.