3DV – TON: A Video Virtual Try – on Framework Proposed by Alibaba DAMO Academy in Collaboration with Zhejiang University and Others

AI Tools updated 5d ago dongdong
6 0

What is 3DV-TON?

3DV-TON (Textured 3D-Guided Consistent Video Try-on via Diffusion Models) is a video virtual try-on framework developed jointly by Alibaba DAMO Academy, Lakeside Labs, and Zhejiang University. It addresses the limitations of existing methods in handling complex clothing patterns and diverse human poses. The framework utilizes animatable textured 3D meshes as explicit frame-level guidance to ensure high visual quality and temporal consistency in the generated try-on videos. 3DV-TON also introduces a high-resolution benchmark dataset, HR-VVT, to advance research in the video try-on field.

3DV - TON: A Video Virtual Try - on Framework Proposed by Alibaba DAMO Academy in Collaboration with Zhejiang University and Others


Key Features of 3DV-TON

  • High-Fidelity Visual Effects:
    Accurately reproduces clothing details, generating realistic virtual try-on results.

  • Temporal Consistency:
    Ensures that clothing textures remain coherent across video frames, preventing artifacts or distortions.

  • Robust to Complex Scenarios:
    Capable of handling diverse clothing styles, complex human poses, and dynamic scenes.

  • Benchmark Dataset:
    Introduces the high-resolution HR-VVT benchmark dataset to support research and evaluation in video try-on technologies.


Technical Principles of 3DV-TON

  • Textured 3D Guidance:
    Uses single-image 3D reconstruction to create animatable textured 3D meshes. These meshes are synchronized with the pose in the original video, offering explicit frame-level guidance to the diffusion model, ensuring consistency in appearance and motion.

  • Dynamic 3D Guidance Pipeline:
    Key frames are selected for initial 2D try-on, from which animatable textured 3D meshes are reconstructed. SMPL-X parameters are optimized to ensure precise alignment between the 3D mesh and human pose.

  • Rectangular Masking Strategy:
    Prevents clothing information leakage and mitigates artifacts caused by dynamic body and clothing motion. Incorporates both clothing and try-on images as references to provide context and enhance generation quality.

  • Diffusion Model Architecture:
    Based on Stable Diffusion, with an extended UNet architecture supporting pseudo-3D structures. Temporal modules are integrated to enable realistic motion generation, reducing reliance on explicit optical flow or deformation operations.

  • Training Strategy:
    Combines both image and video data for training. Random selection of data types balances image quality and temporal consistency. A Classifier-Free Guidance (CFG) strategy randomly omits some conditional inputs to enhance model robustness.


3DV-TON Project Links


Application Scenarios of 3DV-TON

  • Online Shopping:
    Enables users to virtually try on clothes, enhancing the shopping experience and reducing return rates.

  • Fashion Design:
    Allows rapid visualization of clothing designs, aiding in design and marketing processes.

  • Virtual Fitting Rooms:
    Helps save time and effort in physical stores by offering virtual try-on options.

  • Film and Gaming:
    Assists in designing and customizing character outfits, improving production efficiency.

  • Social Media:
    Provides users with fun tools for creating and sharing try-on videos.

© Copyright Notice

Related Posts

No comments yet...

none
No comments yet...