LongCat AI
Next-generation multi-modal AI models by Meituan: Flash-Chat, Flash-Thinking, Video, and Audio-Codec. Open-source MoE LLMs with cutting-edge capabilities.
About the LongCat Models
The LongCat series represents Meituan's progressive evolution in multi-modal AI, with four specialized variants addressing different capabilities:
LongCat-Flash-Chat (September 1, 2025)
The foundation dialogue model with 560B parameters in a Mixture-of-Experts (MoE) architecture. It activates approximately 18.6B–31.3B parameters per token (averaging ~27B) through Zero-Computation Experts, achieving competitive quality with high throughput and low latency. Supports up to 128k context length and demonstrates strong instruction following, reasoning, and coding capabilities—especially in agentic tool-use scenarios. Achieves 100+ tokens/s on H800 GPUs.
LongCat-Flash-Thinking (September 22, 2025)
Enhanced reasoning model focusing on "Agentic Reasoning" and "Formal Reasoning". Features a dual-path reasoning framework combining agentic tool use with formal reasoning capabilities. Built with DORA (Dynamic Orchestration for Asynchronous rollout) for asynchronous large-scale training. Dramatically improved tool-call efficiency with 64.5% token savings in AIME25 tests (from 19,653 tokens down to 6,965).
LongCat-Video (October 27, 2025)
Video generation model based on Diffusion Transformer (DiT) architecture. Unified support for text-to-video, image-to-video, and video continuation tasks. Generates coherent 5-minute videos at 720p resolution and 30 fps, with emphasis on long temporal sequences, cross-frame consistency, and physical motion plausibility.
LongCat-Audio-Codec
Audio processing module providing low-bitrate, real-time streaming audio tokenization and detokenization for speech LLMs, enabling efficient audio encoding and decoding.
This page summarizes highlights from official resources and presents a concise overview for new users.
Key Features & Technologies
- Zero-Computation Experts: Smart MoE routing mechanism that activates only 18.6B–31.3B parameters per token (averaging ~27B) from a 560B parameter pool, achieving cost efficiency.
- Shortcut-connected MoE (ScMoE): Overlaps computation and communication for speed at scale, reducing latency.
- High-throughput inference: Achieves 100+ tokens/s on H800 GPUs with optimized architecture and efficient routing.
- DORA training system: Dynamic Orchestration for Asynchronous rollout enables efficient large-scale training across domains.
- Dual-path reasoning framework: Combines agentic tool use with formal reasoning for enhanced problem-solving capabilities.
- Enhanced tool-call efficiency: 64.5% token reduction in agentic scenarios, improving cost-effectiveness.
- Video generation capabilities: Diffusion Transformer (DiT) architecture producing 5-minute coherent videos at 720p/30fps.
- Multi-modal support: From dialogue and reasoning to video and audio processing.
- Extended context handling: Up to 128k tokens for complex, multi-document tasks.
- Training innovations: Hyperparameter transfer, model-growth initialization, variance alignment, and router balancing.
- Real-world applications: Deployed across Meituan's services including AI coding, meetings, and documentation tools.
- Open-source ecosystem: Released under MIT License with support for model distillation and secondary development.
Selected Benchmarks
Representative results reported by the authors (non-exhaustive):
| Category | Benchmark | Metric | LongCat-Flash |
|---|---|---|---|
| General Domains | MMLU | acc | 89.71 |
| Instruction Following | IFEval | acc | 89.65 |
| Math Reasoning | MATH500 | acc | 96.40 |
| General Reasoning | DROP | F1 | 79.06 |
| Coding | Humaneval+ | pass@1 | 88.41 |
| Agentic Tool Use | τ²-Bench (telecom) | avg@4 | 73.68 |
Values summarized from public reports; please consult the official resources for full details and conditions.
Latest Updates & Timeline
- October 27, 2025 - LongCat-Video Launch: Introduction of the video generation model based on Diffusion Transformer architecture. Capable of generating 5-minute coherent videos at 720p/30fps, supporting text-to-video, image-to-video, and video continuation tasks. Emphasizes physical motion plausibility and long temporal consistency.
- September 22, 2025 - LongCat-Flash-Thinking Release: Enhanced reasoning model focusing on agentic and formal reasoning capabilities. Features dual-path reasoning framework and DORA asynchronous training system. Achieves 64.5% token savings in tool-call scenarios (from 19,653 to 6,965 tokens in AIME25 tests).
- September 1, 2025 - LongCat-Flash-Chat Open Source: Meituan officially released the foundation dialogue model (MoE, 560B params) with strong throughput and competitive accuracy. Includes Zero-Computation Experts maintaining ~27B active params/token. Prominent media coverage across Sina Finance, tech platforms.
- Training Innovations & Scale: Successfully trained on >20T tokens in ~30 days across large GPU clusters. Training stability achieved through variance alignment, hyperparameter transfer, and router balancing. Successfully demonstrated training path on domestic acceleration cards.
- Real-World Deployment: Meituan integrates LongCat models across internal tools for AI coding, intelligent meetings, and documentation systems, demonstrating practical business applications.
- Benchmark Performance: Excellent results on MMLU, CEval, terminal command benchmarks, and agentic tool-call evaluations, showcasing strong general and domain-specific capabilities.
These highlights synthesize recent public reporting. For precise details and evolving numbers, consult official releases and model cards.
Quick Start
LongCat-Flash uses a chat template defined in tokenizer_config.json. Examples:
First Turn
[Round 0] USER:{query} ASSISTANT:
With System Prompt
SYSTEM:{system_prompt} [Round 0] USER:{query} ASSISTANT:
Multi-Turn
SYSTEM:{system_prompt} [Round 0] USER:{q} ASSISTANT:{r} ... [Round N-1] USER:{q} ASSISTANT:{r} [Round N] USER:{q} ASSISTANT:
Tool Call Envelope
{tool_description}
## Messages
SYSTEM:{system_prompt} [Round 0] USER:{query} ASSISTANT:
<longcat_tool_call>{"name": <function-name>, "arguments": <args-dict>}</longcat_tool_call>
Deployment & Applications
LongCat models support various deployment scenarios with specialized optimizations:
Flash-Chat & Flash-Thinking
SGLang and vLLM adaptations enable high-throughput inference for LongCat-Flash models. Deployment guides cover environment setup, tensor parallelism, and inference configurations. Supports both single-user and multi-user scenarios with cost-efficient inference around $0.7 per 1M output tokens on H800 GPUs.
Video Generation
LongCat-Video provides unified interfaces for text-to-video, image-to-video, and video continuation tasks. Optimized for generating long-form videos (up to 5 minutes) with high temporal consistency and physical motion plausibility.
Real-World Integration
Meituan has deployed LongCat models across multiple internal services: AI-assisted coding tools, intelligent meeting transcription and summarization, automated documentation generation, and customer service automation. The models are optimized for meal delivery, travel booking, and other domain-specific use cases.
Official Links
Model Variants
- LongCat-Flash-Chat: Dialogue model optimized for conversation and instruction following (Released Sept 1, 2025)
- LongCat-Flash-Thinking: Enhanced reasoning model with dual-path framework (Released Sept 22, 2025)
- LongCat-Video: Video generation model with DiT architecture (Released Oct 27, 2025)
- LongCat-Audio-Codec: Audio processing module for speech LLMs
License & Usage
All LongCat models are released under the MIT License, allowing model distillation, fine-tuning, and secondary development. Evaluate and validate the models before use in sensitive or high-risk scenarios, and ensure compliance with applicable laws and regulations for your use case. The open-source ecosystem supports academic research, commercial applications, and community contributions.
Strategic Direction
Meituan positions LongCat as a core platform capability across three dimensions: "AI at work" (internal productivity tools), "AI in products" (customer-facing services), and "Building LLM" (infrastructure development). The expansion from dialogue to reasoning, video generation, and audio processing represents progress toward world modeling and physical world simulation capabilities.