Apple’s Self-Developed Multimodal AI Model Manzano: Combining Understanding and Generative Capabilities
Apple is developing a multimodal AI model called Manzano, which integrates both image understanding and generation capabilities, aiming to address the trade-offs existing models face when handling visual tasks. Manzano employs a hybrid image tokenizer that produces both continuous and discrete tokens through a shared encoder, reducing task conflicts. Its architecture consists of a hybrid tokenizer, a unified language model, and an independent image decoder. The model’s parameter scale ranges from 900 million to 3.52 billion, supporting multiple resolutions.
© Copyright Notice
The copyright of the article belongs to the author. Please do not reprint without permission.
Related Posts
No comments yet...