Apple’s Self-Developed Multimodal AI Model Manzano: Combining Understanding and Generative Capabilities

AI Daily News updated 3d ago dongdong
18 0

Apple is developing a multimodal AI model called Manzano, which integrates both image understanding and generation capabilities, aiming to address the trade-offs existing models face when handling visual tasks. Manzano employs a hybrid image tokenizer that produces both continuous and discrete tokens through a shared encoder, reducing task conflicts. Its architecture consists of a hybrid tokenizer, a unified language model, and an independent image decoder. The model’s parameter scale ranges from 900 million to 3.52 billion, supporting multiple resolutions.

© Copyright Notice

Related Posts

No comments yet...

none
No comments yet...