The Tongyi Lab has open-sourced its first audio generation model, ThinkSound.

AI Daily News updated 4m ago dongdong

118 0

The Tongyi Lab has open-sourced its first audio generation model, ThinkSound, specifically designed to break the limitations of “silent visuals.” By introducing Chain-of-Thought (CoT) technology, the model enables AI to learn structured reasoning about the relationship between visuals and sound, achieving high-fidelity, tightly synchronized spatial audio generation. Trained on 2,531.8 hours of high-quality multimodal data—including object-level and instruction-level samples—the model supports interactive editing.

© Copyright Notice

The copyright of the article belongs to the author. Please do not reprint without permission.

Related Posts

Switzerland Releases National-Level Open-Source Large Language Model Apertus

Switzerland Releases National-Level Open-Source Large Language Model Apertus

2m ago

01250

Tencent open-sources Youtu-GraphRAG: Making Graph Retrieval-Augmented Generation More Accurate and Cost-Efficient!

Tencent open-sources Youtu-GraphRAG: Making Graph Retrieval-Augmented Generation More Accurate and Cost-Efficient!

2m ago

01100

Midjourney has released the AI image generation model V7, adding a “Sketch Mode” that supports voice-to-image generation.

Midjourney has released the AI image generation model V7, adding a “Sketch Mode” that supports voice-to-image generation.

7m ago

01640

The AI model AlphaGenome for gene variant prediction, developed by Google DeepMind

The AI model AlphaGenome for gene variant prediction, developed by Google DeepMind

5m ago

01680

No comments yet...

none

No comments yet...