Amazon Web Services (AWS) recently announced that its transcription platform, Amazon Transcribe, is now powered by generative AI. This means that Amazon Transcribe can now produce more accurate and natural-sounding transcripts, even in noisy environments or when speakers have accents.
In a blog by Amazon, it was mentioned: “Transcribe’s speech foundation model is trained using best-in-class, self-supervised algorithms to learn the inherent universal patterns of human speech across languages and accents.
“It is trained on millions of hours of unlabeled audio data from over 100 languages. The training recipes are optimized through smart data sampling to balance the training data between languages, ensuring that traditionally under-represented languages also reach high accuracy levels,” it adds.
In addition to being more accurate, the new generative AI model also makes Amazon Transcribe faster. This is because the model can produce transcripts in real time, without the need for a human reviewer.
The Amazon blog states: “By leveraging speech foundation model, Amazon Transcribe delivers significant accuracy improvement between 20% and 50% across most languages. On telephony speech, which is a challenging and data-scarce domain, accuracy improvement is between 30% and 70%. In addition to substantial accuracy improvement, this large ASR model also delivers improvements in readability with more accurate punctuation and capitalization.”
Of course, AWS is not the only company offering AI-powered transcription services. Otter has been providing AI transcriptions to consumers and enterprises for a while and released a summarization tool in June. While not exactly the same, Meta announced it is working on a generative AI-powered translation model that recognizes nearly 100 spoken languages.
The use of generative AI in Amazon Transcribe is a significant step forward for the field of transcription. It has the potential to make transcription more accurate, affordable, and accessible to a wider range of people.
Amazon Transcribe was first introduced in November 2017 and has continually evolved since then. In April 2018, it gained the ability to support custom vocabularies, allowing users to tailor the service to their specific needs. Over the years, the service has expanded its language support to dozens of languages, from accented English (added in November 2018) to more recent additions (in 2019) such as Tamil, Gulf Arabic, and Swiss German.
In September 2021, Amazon Transcribe added the ability to generate subtitles for video files, making it a more versatile tool for multimedia content. In May 2022, Amazon unveiled batch language identification, a feature that enables the service to identify multiple languages within a single audio file, enhancing its ability to handle multilingual content.
These continuous advancements demonstrate Amazon's commitment to developing a comprehensive and user-friendly transcription service. With its expanding language support, diverse feature set, and ability to handle various multimedia formats, Amazon Transcribe has established itself as a valuable tool for businesses and individuals alike.
Comments
All Comments (0)
Join the conversation