| Voxtral Mini Transcribe V2 |
Mistral AI |
API-Optimized |
Proprietary |
~3B |
~4.0% |
$0.003 |
Diarization, word timestamps, context biasing, 13 languages, 3hr audio |
| Voxtral Realtime |
Mistral AI |
Open Weights |
Apache 2.0 |
4B |
~4.0% |
$0.006 |
Streaming, sub-200ms latency, 13 languages, edge-deployable |
| Voxtral Small (24B) |
Mistral AI |
Open Weights |
Apache 2.0 |
24B |
~4.9% |
$0.003 |
Audio understanding, Q&A, summarization, function calling, 32k context |
| Voxtral Mini Transcribe |
Mistral AI |
API-Optimized |
Proprietary |
~3B |
~5.3% |
$0.001 |
Cheapest option, transcription-optimized |
| Voxtral Mini (3B) |
Mistral AI |
Open Weights |
Apache 2.0 |
3B |
~6.9% |
$0.001 |
Audio understanding, Q&A, summarization, edge-friendly, 32k context |
| Whisper large-v3 |
OpenAI |
Open Weights |
MIT |
1.5B |
~8.3% |
Self-hosted |
Word timestamps, 99 languages, mature ecosystem, whisper.cpp, faster-whisper |
| Whisper large-v3-turbo |
OpenAI |
Open Weights |
MIT |
809M |
~8.5% |
Self-hosted |
2x faster than v3, word timestamps, 99 languages, great for fine-tuning |
| GPT-4o mini Transcribe |
OpenAI |
Proprietary |
Proprietary |
N/A |
~5.7% |
$0.003 |
OpenAI API, easy integration |
| Gemini 2.5 Flash |
Google |
Proprietary |
Proprietary |
N/A |
~7.0% |
~$0.003 |
Multimodal, long context, audio understanding |
| ElevenLabs Scribe v2 |
ElevenLabs |
Proprietary |
Proprietary |
N/A |
~4.9% |
$0.010 |
Diarization, word timestamps, 99 languages |
| Deepgram Nova |
Deepgram |
Proprietary |
Proprietary |
N/A |
N/A |
~$0.008 |
Diarization, streaming, custom vocabulary |
| AssemblyAI Universal |
AssemblyAI |
Proprietary |
Proprietary |
N/A |
N/A |
~$0.002 |
Diarization, sentiment, topic detection |
| Kyutai STT (1B / 2.6B) |
Kyutai |
Open Weights |
CC-BY 4.0 |
1B / 2.6B |
N/A |
Self-hosted |
Streaming, word timestamps, voice prompting, Rust server |