LongCat-Video – Meituan’s Open-Source Video Generation Model

AI Tools updated 4h ago dongdong
8 0

What is LongCat-Video?

LongCat-Video is an open-source video generation model developed by Meituan’s LongCat team, featuring 13.6 billion parameters. It excels in tasks such as Text-to-Video, Image-to-Video, and Video-Continuation, particularly in efficiently generating high-quality long videos. Through multi-reward reinforcement learning optimization (GRPO), the model demonstrates performance comparable to leading open-source video generation models and the latest commercial solutions in both internal and public benchmarks.

LongCat-Video – Meituan’s Open-Source Video Generation Model


Main Features of LongCat-Video

1. Long Video Generation:
Pretrained on video continuation tasks, LongCat-Video can generate videos lasting several minutes without noticeable color drift or quality degradation.

2. Unified Multi-Task Architecture:
Text-to-Video, Image-to-Video, and Video-Continuation tasks are unified under a single video generation framework, enabling all tasks to be handled by one model.

3. Efficient Inference:
With a coarse-to-fine generation strategy and Block Sparse Attention technology, the model can generate 720p, 30fps videos within minutes.

4. Multi-Reward Reinforcement Learning Optimization:
Using the Group Relative Policy Optimization (GRPO) approach with multiple reward signals, LongCat-Video achieves performance comparable to leading open-source and commercial video generation systems in both internal and public evaluations.


Technical Principles of LongCat-Video

1. Unified Architecture:
LongCat-Video employs a single, unified framework that integrates Text-to-Video, Image-to-Video, and Video-Continuation tasks. By sharing the same architecture and parameters, it efficiently handles multiple generation tasks.

2. Long Video Generation Technology:
Through pretraining on video continuation tasks and adopting specialized training strategies, the model can generate coherent, high-quality videos lasting several minutes without loss of consistency or visual fidelity.

3. Efficient Inference Strategy:
The model follows a coarse-to-fine generation process—first creating a rough structure and then refining details—while leveraging Block Sparse Attention to boost inference efficiency and reduce generation time for high-resolution videos.

4. Multi-Reward Reinforcement Learning Optimization:
By applying the multi-reward Group Relative Policy Optimization (GRPO) method, the model is optimized across multiple dimensions, including text-video alignment, visual fidelity, and motion quality, enhancing the overall output quality of generated videos.


Project Links


Application Scenarios of LongCat-Video

  • Content Creation:
    Enables creators to quickly generate video materials such as advertisements, short videos, and animations, greatly improving production efficiency.

  • Video Continuation:
    Generates follow-up content for existing video clips, useful for storytelling, video editing, and creative extensions.

  • Education and Training:
    Produces instructional or demonstration videos to enhance teaching and training experiences.

  • Entertainment and Gaming:
    Generates dynamic scenes or character animations to enrich visual effects and immersion in games.

  • Intelligent Customer Service and Virtual Assistants:
    Creates video-based responses for more intuitive and engaging user interactions.

  • Creative Design:
    Assists designers in concept visualization and rapid video prototyping to express creative ideas efficiently.

© Copyright Notice

Related Posts

No comments yet...

none
No comments yet...