Microsoft open-sources TTS model VibeVoice, capable of generating up to 90 minutes of speech

AI Daily News updated 2m ago dongdong

73 0

Microsoft has open-sourced the text-to-speech (TTS) model VibeVoice-1.5B, which can generate up to 90 minutes of natural speech with up to four speakers, supporting cross-lingual and singing synthesis. The model is built on the 1.5B-parameter Qwen2.5 language model and integrates both acoustic and semantic tokenizers, operating at a low frame rate of 7.5 Hz.

© Copyright Notice

The copyright of the article belongs to the author. Please do not reprint without permission.

Related Posts

The Google T5Gemma model has been released

The Google T5Gemma model has been released

4m ago

0810

The release of Kimi K2 has sparked widespread discussion, surpassing many leading large language models.

The release of Kimi K2 has sparked widespread discussion, surpassing many leading large language models.

3m ago

0850

Cursor launches web and mobile agents, enabling parallel multitasking

Cursor launches web and mobile agents, enabling parallel multitasking

4m ago

0820

Just 16GB to Run 27B! Gemma-3 QAT Breaks the Local Deployment Barrier

Just 16GB to Run 27B! Gemma-3 QAT Breaks the Local Deployment Barrier

6m ago

01370

No comments yet...

none

No comments yet...