The Tongyi Lab has open-sourced its first audio generation model, ThinkSound.

AI Daily News updated 3w ago dongdong
11 0

The Tongyi Lab has open-sourced its first audio generation model, ThinkSound, specifically designed to break the limitations of “silent visuals.” By introducing Chain-of-Thought (CoT) technology, the model enables AI to learn structured reasoning about the relationship between visuals and sound, achieving high-fidelity, tightly synchronized spatial audio generation. Trained on 2,531.8 hours of high-quality multimodal data—including object-level and instruction-level samples—the model supports interactive editing.

© Copyright Notice

Related Posts

No comments yet...

none
No comments yet...