The average WER of these providers across the English datasets was 8.61%. By providing a vocabulary, it was reduced to 8.10%.
On the English datasets streaming transcription had a higher WER (10.9%) compared to batch transcription (9.37%).
Speakers can enhance ASR accuracy by speaking more clearly and slowly.