LongCat-Video

Video generation model (Released: October 27, 2025)

Overview

Video generation model based on Diffusion Transformer (DiT) architecture. Unified support for text-to-video, image-to-video, and video continuation tasks. Generates coherent 5-minute videos at 720p resolution and 30 fps, with emphasis on long temporal sequences, cross-frame consistency, and physical motion plausibility.

Key Features

  • 5-minute videos: Long-form coherent video generation
  • 720p/30fps: High-quality output
  • Unified interface: Text-to-video, image-to-video, and video continuation
  • Temporal consistency: Cross-frame coherence
  • Physical plausibility: Realistic motion