WhisperNER: Unified Open Named Entity and Speech Recognition

In this paper, we present WhisperNER, a new model that combines named entity recognition (NER) with automatic speech recognition (ASR) to improve transcription accuracy and information extraction. WhisperNER supports open-type NER, which allows it to identify various entities in speech. By training on a large synthetic dataset with diverse NER tags, WhisperNER outperforms natural baselines in both out-of-domain open type NER and supervised finetuning. This innovative approach showcases the potential of integrating NER with ASR to enhance transcription capabilities.

https://arxiv.org/abs/2409.08107

To top