Learn

Voice typing accuracy explained

Q: What WER should I expect?

On English with a decent microphone in a quiet environment, expect 5–10% WER from any modern tool. Custom dictionaries and AI cleanup take the perceived quality much higher than that raw number suggests.

Q: Are some accents handled poorly?

Heavily under-represented accents in training data (e.g., very strong regional dialects) may have higher WER. Custom dictionaries help compensate. Some users report that Wispr Flow and FluidVox handle accents better than older Apple Dictation.

Q: Does noise-cancelling help?

Yes. AirPods with active noise cancellation, or a USB headset with a directional mic, both reduce ambient pickup and improve accuracy.

Q: Why does accuracy seem worse on my old Mac?

On-device Whisper models load slower on older hardware, and resource contention can cause dropped audio. Try the cloud transcription option (FluidVox Pro, Wispr Flow) to remove the hardware constraint.

Q: Will my accuracy improve over time?

With FluidVox's auto-learning dictionary, yes — the more corrections you make, the more terms it preserves correctly. With tools that don't auto-learn, you have to manually maintain the dictionary.

Q: Is there a "perfect" accuracy tool?

No. Even human transcribers make 4–6% errors. The right question is which tool gets closest to your specific environment, accent, and vocabulary — and how easy it is to correct the errors that remain.

Modern voice typing tools advertise "high accuracy," but the actual quality you see depends on five factors you can largely control: microphone, environment, accent fit, custom vocabulary, and choice of cleanup model.

Start 14-day free trial

How accuracy is measured

Word Error Rate (WER)

The standard metric for speech recognition is Word Error Rate (WER): the percentage of words wrong in a transcript compared to a perfect reference. Lower is better.

Modern English speech recognition models report WER in the 4–8% range on clean conversational audio. For comparison, human transcribers typically achieve 4–6% WER on the same audio.

WER varies dramatically by:

Language: English, Spanish, Mandarin tend to have the lowest WER. Less-resourced languages can be 15–25%.
Accent: Native-speaker accents in the training data perform best. Regional or non-native accents may add 2–10 points to WER.
Audio quality: Clean studio audio vs noisy open-office vs voice-over-phone vary by 5–15 points.
Vocabulary: General conversation lowest WER. Technical, medical, or legal jargon adds 3–8 points without a custom dictionary.

Microphone matters more than you think

The biggest single accuracy lever

Built-in laptop microphones pick up keyboard noise, fan noise, and reverb. A USB headset or lapel mic typically reduces WER by 2–5 points just from cleaner input.

For most users, the highest-impact accuracy upgrade isn't a different software tool — it's a $40 USB microphone or AirPods Pro with the integrated mic.

Custom dictionaries close most gaps

For words the model consistently mishears

If a speech model consistently transcribes "Cypher" as "cipher," "kubectl" as "cube cuddle," or your colleague's name "Aanya" as "on you" — adding those terms to a custom dictionary fixes the problem permanently.

Tools like FluidVox auto-learn from your corrections: when you fix a transcription, the system remembers and applies the correction across future sessions. That's how accuracy compounds over weeks of use.

On-device vs cloud accuracy

A modest gap, narrowing fast

Cloud speech recognition models (Deepgram, Azure Speech) are still slightly more accurate than on-device alternatives in 2026, but the gap has narrowed significantly. Whisper large v2 running on Apple Silicon achieves WER within 1–3 points of cloud APIs on most languages.

For most users, on-device is now the right default — the privacy gains are real and the accuracy delta is small.

AI cleanup helps perceived accuracy

Even when raw WER is the same

Two tools with identical raw WER can produce very different output quality if one applies AI cleanup and the other doesn't. Cleanup fixes:

Filler words ("uh," "um," "like")
Restarted sentences
Missing punctuation
Casing on names and acronyms
Common transcription mistakes for specific phrases

This is the practical difference between Apple's built-in Dictation (no AI cleanup) and a tool like FluidVox (full LLM cleanup). The raw WER may be similar; the output quality is markedly different.

How to improve your accuracy

Practical levers you control

Use a real microphone. A $40 USB headset gets you most of the way.
Reduce ambient noise. Turn off the fan, close the window.
Speak at conversational pace. Not too fast, not robotically slow.
Build your custom dictionary. Add the 50 terms you use most that get mistranscribed.
Pick the right tool. See our 2026 Mac comparison or Windows comparison.

Frequently asked questions

What WER should I expect?

Are some accents handled poorly?

Does noise-cancelling help?

Why does accuracy seem worse on my old Mac?

Will my accuracy improve over time?

Is there a "perfect" accuracy tool?

Try FluidVox free for 14 days

Full access, no credit card required. Then $2.99/month or $39 one-time.

Start free trial