LongCat Models
Comprehensive overview of all LongCat AI model variants and their capabilities.
Model Variants
LongCat-Flash-Chat
Released: September 1, 2025
Foundation dialogue model with 560B parameters in a Mixture-of-Experts (MoE) architecture. Activates approximately 18.6B–31.3B parameters per token (averaging ~27B) through Zero-Computation Experts.
- Supports up to 128K context length
- Achieves 100+ tokens/s on H800 GPUs
- Strong instruction following, reasoning, and coding
LongCat-Flash-Thinking
Released: September 22, 2025
Enhanced reasoning model focusing on "Agentic Reasoning" and "Formal Reasoning". Features a dual-path reasoning framework and DORA asynchronous training system.
- 64.5% token savings in tool-call scenarios
- Improved tool-call efficiency
- Formal and agentic reasoning capabilities
LongCat-Video
Released: October 27, 2025
Video generation model based on Diffusion Transformer (DiT) architecture. Unified support for text-to-video, image-to-video, and video continuation tasks.
- Generates 5-minute coherent videos at 720p/30fps
- Long temporal sequences and cross-frame consistency
- Physical motion plausibility
LongCat-Flash-Omni
Released: November 2025
First open-source real-time all-modality interaction model. Unifies text, image, audio, and video with a single end-to-end ScMoE backbone.
- Open-source SOTA on Omni-Bench and WorldSense
- Low-latency, streaming multi-modal IO
- 128K context with multi-turn dialogue
LongCat-Image
Released: Latest | Parameters: 6B
Open-source AI image generation and editing model. Achieves open-source SOTA on image editing benchmarks (GEdit-Bench, ImgEdit-Bench) and leading performance in Chinese text rendering (ChineseWord: 90.7). Covers all 8,105 standard Chinese characters.
- Image editing: Open-source SOTA (ImgEdit-Bench 4.50, GEdit-Bench 7.60/7.64)
- Chinese text rendering: 90.7 on ChineseWord, covering all 8,105 characters
- Text-to-image: GenEval 0.87, DPG-Bench 86.8
- Available on LongCat Web and LongCat APP (24 templates, image-to-image)
- Fully open-source: Hugging Face | GitHub
LongCat-Audio-Codec
Audio processing module providing low-bitrate, real-time streaming audio tokenization and detokenization for speech LLMs, enabling efficient audio encoding and decoding.
Model Comparison
| Model | Parameters | Key Feature | Use Case |
|---|---|---|---|
| Flash-Chat | 560B (MoE) | High-throughput dialogue | General conversation, coding |
| Flash-Thinking | 560B (MoE) | Enhanced reasoning | Tool use, formal reasoning |
| Video | DiT-based | Video generation | Text/image-to-video, continuation |
| Flash-Omni | ScMoE | All-modality | Multi-modal interaction |
| Image | 6B (MM-DiT+Single-DiT) | Image generation & editing (Open-source SOTA) | Text-to-image, image editing, Chinese text rendering |
Get Started
Choose a model to explore detailed documentation, benchmarks, and deployment guides: