OpenAI’s Whisper can Reach Human-Level Robustness in ASR

OpenAI’s Whisper can Reach Human-Level Robustness in ASR
Published on

OpenAI's Whisper will enable speech recognition apps to reach new levels of efficiency

Speech recognition or voice recognition technology has come a long since the concept first emerged. But users continue to have only one persisting problem with voice recognition, which is accuracy. Over the past couple of years, researchers have been working on building AI algorithms that can accurately process voice input and consistently focus on the research and development of speech development. Recently, OpenAI's Whisper is making headlines for being an avant-garde open-source ML model which can perform automatic speech recognition on a wide selection of global languages. With the help of a unique transformed trained on 680,000 hours of weekly-supervised, multi-lingual audio data, OpenAI's Whisper can conduct human-level robustness and accuracy in ASR, without the need for fine-tuning or any intermediaries. The model is basically open-source and has various weight sizes available to the public.

Over the years, countless big tech companies have been trying to reach an efficient level of accuracy in ASR systems, which sit at the core of this recognition software apps, besides, the services from tech giants like Google, Amazon, and Meta have abundantly helped the growth and development of the speech recognition domain. OpenAI mentioned in the GitHub repository for Whisper that the ASR has shown successful results in over 10 languages and demonstrates additional capabilities in tasks like voice activity detection, speaker classification, or speaker diarization, which weren't actively addressed previously.

Is Whisper Really Not Limitless?

No, Whisper does have its limitations, particularly in the area of text prediction. The system is basically trained on a large amount of noisy data, which mostly contains words in its transcriptions that were not actually spoken, mainly because it tries to predict the next word through audio and try to transcribe the audio itself. Furthermore, this open-source ML model does not really perform well across languages, which are suffering from a higher error rate when it comes to speakers of languages that are not well-represented in the training data.

Bias has been one of the major reasons that hinder the streamlining of machine learning models. Studies conducted by some of the best tech companies in the world like Google, IBM, and Amazon have reduced the proximity of the errors. Despite this, OpenAI's Whisper has transcription capabilities being used to improve existing accessibility tools.

Bottom Line

Whisper does not really reflect the full potential of OpenAI, nor do its plans. The efforts to aid the growing popularity of Dall-E 2 and GPT-3, but the company is definitely pursuing several research projects on AI research.

Join our WhatsApp Channel to get the latest news, exclusives and videos on WhatsApp

                                                                                                       _____________                                             

Disclaimer: Analytics Insight does not provide financial advice or guidance. Also note that the cryptocurrencies mentioned/listed on the website could potentially be scams, i.e. designed to induce you to invest financial resources that may be lost forever and not be recoverable once investments are made. You are responsible for conducting your own research (DYOR) before making any investments. Read more here.

Related Stories

No stories found.
logo
Analytics Insight
www.analyticsinsight.net