Miras – a general-purpose framework for deep learning architecture design developed by Google.
What is Miras?
Miras is a general-purpose framework for deep learning architecture design developed by Google, particularly tailored for sequence modeling tasks. Built on the concepts of associative memory and attentional bias, Miras redefines models like Transformers and modern linear RNNs as associative memory modules with internal optimization objectives. It introduces four key building blocks—associative memory architecture, attentional bias objectives, retention gates, and memory learning algorithms—to guide the design of new sequence models.
Miras enables the creation of novel architectures such as Moneta, Yaad, and Memora, which demonstrate outstanding performance in tasks like language modeling and commonsense reasoning, outperforming existing Transformer and linear RNN models.
Key Features of Miras
-
Unifying Existing Architectures: Miras provides a common framework that encompasses various sequence models like Transformers, RetNet, and Mamba.
-
Optimized Memory Management: By incorporating attentional bias and retention gates, Miras improves the model’s ability to balance learning new information with retaining past data.
-
Novel Model Design: Enables the design of new sequence models with customized attentional bias and retention mechanisms, such as Moneta, Yaad, and Memora.
-
Improved Long-Sequence Performance: Enhances performance on long sequence tasks while maintaining efficient parallel training capabilities.
Technical Foundations of Miras
-
Associative Memory: Miras views sequence models as associative memory modules, mapping inputs (keys) to outputs (values). This mechanism determines how the model stores and retrieves information from sequential data.
-
Attentional Bias: An internal optimization objective within associative memory, attentional bias dictates how the model prioritizes attention to certain inputs. It uses different loss functions (e.g., ℓ2 regression, ℓ1 regression, Huber loss) to control sensitivity and robustness in learning key-value relationships.
-
Retention Gate: A regularization mechanism that governs how old information is retained when new information is learned. It introduces retention regularizers (e.g., ℓ2 regularization, KL divergence) to balance learning and memory retention, which is crucial for long-sequence tasks.
-
Memory Learning Algorithms: Algorithms like gradient descent and momentum gradient descent are employed to optimize the associative memory’s objective functions, improving training efficiency and convergence.
Project Link
-
arXiv Technical Paper: https://arxiv.org/pdf/2504.13173
Application Scenarios for Miras
-
Language Modeling: Ideal for NLP researchers and text generation developers handling long-form text and complex dependencies.
-
Commonsense Reasoning: Enhances reasoning and inference capabilities in AI systems and virtual assistants.
-
Long Text Processing: Improves processing efficiency and reduces resource usage for long-form content in text analytics and information retrieval.
-
Multimodal Tasks: Boosts cross-modal inference by integrating multiple types of input, beneficial for researchers and engineers in multimedia content analysis.