VeOmni – ByteDance’s Open-Source Fully-Modal PyTorch-Native Training Framework

AI Tools updated 21h ago dongdong
8 0

What is VeOmni?

VeOmni is an open-source, fully-modal distributed training framework released by ByteDance’s Seed team, designed natively on PyTorch. VeOmni adopts a model-centric approach that decouples distributed parallelism logic from model computation, supporting flexible combinations of multiple parallel strategies (such as FSDP, SP, EP) and efficiently scaling to ultra-long sequences and large-scale Mixture-of-Experts (MoE) models. It provides lightweight full-modal interfaces to simplify multi-modal encoder-decoder integration and incorporates optimizations like dynamic batching and efficient operators, significantly improving training efficiency and stability. VeOmni has been applied in multiple cutting-edge projects, supporting research and development of full-modal large models.

VeOmni – ByteDance’s Open-Source Fully-Modal PyTorch-Native Training Framework


Key Features of VeOmni

  • Support for Full-Modal Model Training: VeOmni can train models across any modality (text, image, audio, video, etc.), covering tasks from single-modal to full-modal scenarios.

  • Efficient Distributed Training: Supports flexible combinations of parallel strategies (FSDP, SP, EP) and scales efficiently across large GPU clusters.

  • Ultra-Long Sequence Support: Handles sequences up to 192K, suitable for high-resolution images, long videos, and other complex multi-modal data.

  • Lightweight Interfaces & Usability: Quick integration of multi-modal encoder-decoders, simplifying model development workflows.

  • System-Level Optimizations: Integrates dynamic batching, efficient operators, recomputation and memory optimizations, and ByteCheckpoint, enhancing training efficiency and stability.

  • Training Stability: Demonstrates stable convergence in complex multi-modal tasks, suitable for practical applications.

  • Flexible Model Extension: Supports various architectures (MoE, Transformer, etc.), allowing customization of model components to meet diverse research and development needs.


Technical Principles of VeOmni

  • Model-System Decoupling: Separates model definition from distributed training logic, fully decoupling model code from parallel strategies. Users can configure parallel strategies via high-level APIs without modifying model code.

  • Distributed Parallel Strategies: Shards model parameters, gradients, and optimizer states across devices, reducing memory load on individual GPUs. Optimizes communication through split activation tensors to support ultra-long sequences. MoE experts are distributed across devices to enhance training efficiency. Parallel_state based on DeviceMesh simplifies management of n-D parallel strategies, allowing flexible combinations.

  • Lightweight Full-Modal Interface: Uses HuggingFace-style interfaces, enabling users to integrate multi-modal encoder-decoders quickly by implementing unified functions (e.g., lm_encodelm_generate).

  • System-Level Optimizations: Incorporates dynamic batching, efficient operators, recomputation and memory optimizations, and ByteCheckpoint to comprehensively improve efficiency and stability.


Project Links


Application Scenarios

  • Multi-Modal Content Generation: Generate images or videos from text descriptions, or produce textual descriptions for images or videos, widely used in creative design and content creation.

  • Multi-Modal Understanding & Q&A: Answer visual questions by combining image and text inputs, or handle complex multi-modal question-answering tasks, enhancing intelligent interaction experiences.

  • Multi-Modal Agents: Develop virtual assistants and multi-modal robots that interact with users and perform tasks using voice, text, and visual information.

  • Content Creation & Editing: Generate creative design elements from text descriptions, assist content review, and improve efficiency in content creation and editing.

  • Education & Training: Provide virtual training platforms, enhancing interactivity and effectiveness in educational and training scenarios.

© Copyright Notice

Related Posts

No comments yet...

none
No comments yet...