SkyReels-V2: Kunlun Wanwei’s Open-Source Infinite-Length Film Generative Model

AI Tools posted 2m ago dongdong
36 0

What is SkyReels-V2

SkyReels-V2 is an infinite-length film generative model developed by Kunlun Wanwei’s SkyReels team. Built upon the Diffusion Forcing framework, it integrates technologies such as Multi-modal Large Language Models (MLLM), multi-stage pretraining, and reinforcement learning to generate high-quality, infinitely long video content. SkyReels-V2 addresses existing challenges in prompt adherence, visual quality, motion dynamics, and video duration coordination. It supports various applications, including story generation, image-to-video synthesis, camera directing functions, and multi-subject consistent video generation. The model and related code have been open-sourced, providing a powerful tool for creative content production and virtual simulation fields.​

SkyReels-V2: Kunlun Wanwei's Open-Source Infinite-Length Film Generative Model


Key Features of SkyReels-V2

  • Infinite-Length Video GenerationSupports the generation of theoretically infinite-length video content, breaking the duration limitations of traditional video generation models.

  • Story GenerationArranges complex multi-action sequences based on narrative text prompts to achieve dynamic storytelling.

  • Image-to-Video SynthesisOffers two methods: fine-tuning a full-sequence text-to-video diffusion model (SkyReels-V2-I2V) and combining the Diffusion Forcing model with frame conditions (SkyReels-V2-DF) to transform static images into coherent videos.

  • Camera Directing FunctionSupports the generation of smooth and diverse camera motion effects, enhancing the cinematic feel of videos.

  • Elements-to-Video GenerationCombines arbitrary visual elements (such as characters, objects, and backgrounds) into coherent videos guided by text prompts, suitable for applications like short dramas, music videos, and virtual e-commerce content creation.


Technical Principles of SkyReels-V2

  • Multi-modal Large Language Model (MLLM)Utilizes MLLMs to generate initial video descriptions, supplemented by sub-expert models (e.g., shot type, angle, position, expressions, and camera motion) for detailed shot language descriptions. Human annotation and model training further enhance the understanding of cinematic grammar, significantly improving prompt adherence in generated videos.

  • Multi-Stage Pretraining:

    • Progressive Resolution Pretraining: Gradually increases resolution from low (256p) to high (720p) to enhance the model’s generative capabilities.

    • Multi-Stage Post-Training Optimization: Includes initial concept-balanced supervised fine-tuning (SFT), motion-specific reinforcement learning (RL) training, Diffusion Forcing (DF) training, and high-quality SFT to ensure optimal performance across various aspects.​

  • Reinforcement Learning (RL): Optimizes motion quality through reinforcement learning to address shortcomings in motion dynamics, smoothness, and physical plausibility. A semi-automated data collection pipeline generates preference comparison data pairs, training a reward model and performing Direct Preference Optimization (DPO) to enhance motion quality.​

  • Diffusion Forcing Framework: Assigns independent noise levels to each frame, enabling infinite video generation. A non-decreasing noise schedule reduces the denoising time search space for consecutive frames from O(1e48) to O(1e32), significantly improving generation efficiency.​

  • Efficient Data Processing and Optimization: Integrates general datasets, self-collected media, and art resource libraries, employing multi-stage filtering and annotation to ensure training data quality. Techniques like FP8 quantization, multi-GPU parallelism, and model distillation significantly reduce inference time and computational costs, enhancing the model’s practicality.​


Project Addresses for SkyReels-V2


Application Scenarios for SkyReels-V2

  • Film Production: Generates infinite-length coherent videos for complex storytelling and long-shot creation.​

  • Advertising Creation: Transforms static images into dynamic videos, enhancing the appeal and expressiveness of advertisements.​

  • Video Shooting Assistance: Generates smooth camera motion effects to aid in designing and implementing complex shots.​

  • Short Dramas and Music Videos: Quickly produces high-quality videos, reducing shooting costs and time.​

  • Virtual Reality and Game DevelopmentGenerates realistic virtual scenes and character animations, enhancing user experience and immersion.

© Copyright Notice

Related Posts

No comments yet...

none
No comments yet...