Meta has unveiled its new speech-to-text, text-to-speech AI language model which can recognise over 4,000 spoken languages and transcribe speeches in more than 1,100 languages, in an effort to preserve languages that are at risk of disappearing.
This comes as Meta is aware that equipping machines with the ability to recognise and produce speech can make information available to lots of people, especially for those who rely on voice. However, creating high-quality machine learning models for these tasks necessitates a large amount of labelled data, which is only available for a small number of languages. Existing speech recognition models only cover about 100 languages — a tiny fraction of the world's 7,000 known languages.
Also known as Massively Multilingual Speech (MMS), the project combines wav2vec 2.0, Meta’s work in self-supervised learning, and a new dataset that provides labelled data for over 1,100 languages and unlabelled data for nearly 4,000 languages. The project even includes languages with no prior speech technology and only a few hundred speakers.
Meta is making its models available to the public through the code hosting service GitHub. It claims that making them open source will assist developers working in various languages in developing new speech applications, such as messaging services that understand everyone or virtual-reality systems that can be used in any language.
“The Massively Multilingual Speech models outperform existing models and cover ten times the number of languages. Through this work, we hope to make a small contribution to preserve the incredible language diversity of the world,” Meta shared. “Today, we are publicly sharing our models and code so that others in the research community can build upon our work. Through this work, we hope to make a small contribution to preserve the incredible language diversity of the world,” the statement read.
In the process of creating the model, Meta even turned to religious texts such as the Bible, which have been translated into numerous languages and whose translations have been extensively studied for text-based language translation research. These translations include publicly accessible audio recordings of people reading the texts in various languages.
According to Meta, the Massively Multilingual Speech project is an important step in preserving endangered languages with speech recognition and generation technology. Looking forward, Meta will be expanding language coverage to support even more languages, as well as to address the challenge of dealing with dialects, which is frequently difficult for existing speech technology. Its goal is to make it easier for people to access information and use devices in the language of their choice.
Recently, Meta has unveiled new generative AI-powered ad tools and features on its AI Sandbox to help advertisers build ads efficiently and improve campaign results.
The space acts as Meta's "testing playground" for early versions of new tools and features. The text variation feature allows brands to generate multiple versions of text to highlight the important points of an advertiser’s copy, providing them choices to try different messages for certain audiences.
Meanwhile, the background generation feature creates background images from text inputs, allowing advertisers to try various backgrounds more quickly and diversify their creative assets. The image cropping feature adjusts creative assets to fit different aspect ratios across multiple surfaces such as Stories or Reels, allowing advertisers to spend less time and resources on repurposing creative assets.
Meta, Sa Sa and Vita Green join Omnichat to share WhatsApp marketing
Meta experiences ad glitch wiping out advertising budgets
Meta introduces new features on Instagram Reels for creators
Get the daily lowdown on Asia's top marketing stories.
We break down the big and messy topics of the day so you're updated on the most important developments in Asia's marketing development – for free.subscribe now open in new window