
Alibaba Cloud has announced the launch of two new artificial intelligence models designed specifically for producing and simulating voices from written texts, in order to support specialized audio applications and content creation.
The first model enables users to create voices based on detailed specifications that include emotions, pitch, speech rate, age, and style, providing great control over the final results. This model outperforms the “OpenAI” interface in terms of performance.
The second model specializes in voice cloning, where it can replicate a voice from a short audio clip lasting no more than three seconds and reproduce it in ten different languages, with a lower error rate compared to competitors such as “ElevenLabs” and “MiniMax”, according to the company.
These tools are available through the Alibaba Cloud API, with trial versions available on the “Hugging Face” platform. This announcement comes amidst increasing competition in the AI-powered audio technology market, which includes multiple uses in areas such as advertising, multilingual dubbing, gaming, online education, and call centers, significantly reducing time and costs compared to traditional methods.