Xiaomi open-sources its first native end-to-end speech large model, Xiaomi-MiMo-Audio
Xiaomi has open-sourced its first native end-to-end speech large model, Xiaomi-MiMo-Audio. Built on an innovative pre-training architecture and trained with billions of hours of data, the model achieves, for the first time in the speech domain, in-context learning (ICL)-based few-shot generalization, demonstrating cross-modal alignment capabilities. In multiple benchmark evaluations, Xiaomi-MiMo-Audio outperforms open-source models with comparable parameter sizes as well as closed-source models from Google and OpenAI.
© Copyright Notice
The copyright of the article belongs to the author. Please do not reprint without permission.
Related Posts
No comments yet...