LongCat-Video

Video generation model (Released: October 27, 2025)

Overview

Video generation model based on Diffusion Transformer (DiT) architecture. Unified support for text-to-video, image-to-video, and video continuation tasks. Generates coherent 5-minute videos at 720p resolution and 30 fps, with emphasis on long temporal sequences, cross-frame consistency, and physical motion plausibility.

Key Features

5-minute videos: Long-form coherent video generation
720p/30fps: High-quality output
Unified interface: Text-to-video, image-to-video, and video continuation
Temporal consistency: Cross-frame coherence
Physical plausibility: Realistic motion

Resources

Documentation & Quick Start
Back to Models Overview