Microsoft open-sources TTS model VibeVoice, capable of generating up to 90 minutes of speech

AI Daily News updated 6h ago dongdong
8 0

Microsoft has open-sourced the text-to-speech (TTS) model VibeVoice-1.5B, which can generate up to 90 minutes of natural speech with up to four speakers, supporting cross-lingual and singing synthesis. The model is built on the 1.5B-parameter Qwen2.5 language model and integrates both acoustic and semantic tokenizers, operating at a low frame rate of 7.5 Hz.

© Copyright Notice

Related Posts

No comments yet...

none
No comments yet...