LongCat AI - Next-Generation Multi-Modal AI Models

About the LongCat Models

The LongCat series represents Meituan's progressive evolution in multi-modal AI, with four specialized variants addressing different capabilities:

LongCat-Flash-Chat (September 1, 2025)

The foundation dialogue model with 560B parameters in a Mixture-of-Experts (MoE) architecture. It activates approximately 18.6B–31.3B parameters per token (averaging ~27B) through Zero-Computation Experts, achieving competitive quality with high throughput and low latency. Supports up to 128k context length and demonstrates strong instruction following, reasoning, and coding capabilities—especially in agentic tool-use scenarios. Achieves 100+ tokens/s on H800 GPUs.

LongCat-Flash-Thinking (September 22, 2025)

Enhanced reasoning model focusing on "Agentic Reasoning" and "Formal Reasoning". Features a dual-path reasoning framework combining agentic tool use with formal reasoning capabilities. Built with DORA (Dynamic Orchestration for Asynchronous rollout) for asynchronous large-scale training. Dramatically improved tool-call efficiency with 64.5% token savings in AIME25 tests (from 19,653 tokens down to 6,965).

LongCat-Video (October 27, 2025)

Video generation model based on Diffusion Transformer (DiT) architecture. Unified support for text-to-video, image-to-video, and video continuation tasks. Generates coherent 5-minute videos at 720p resolution and 30 fps, with emphasis on long temporal sequences, cross-frame consistency, and physical motion plausibility.

LongCat-Audio-Codec

Audio processing module providing low-bitrate, real-time streaming audio tokenization and detokenization for speech LLMs, enabling efficient audio encoding and decoding.

This page summarizes highlights from official resources and presents a concise overview for new users.

Key Features & Technologies

Zero-Computation Experts: Smart MoE routing mechanism that activates only 18.6B–31.3B parameters per token (averaging ~27B) from a 560B parameter pool, achieving cost efficiency.
Shortcut-connected MoE (ScMoE): Overlaps computation and communication for speed at scale, reducing latency.
High-throughput inference: Achieves 100+ tokens/s on H800 GPUs with optimized architecture and efficient routing.
DORA training system: Dynamic Orchestration for Asynchronous rollout enables efficient large-scale training across domains.
Dual-path reasoning framework: Combines agentic tool use with formal reasoning for enhanced problem-solving capabilities.
Enhanced tool-call efficiency: 64.5% token reduction in agentic scenarios, improving cost-effectiveness.
Video generation capabilities: Diffusion Transformer (DiT) architecture producing 5-minute coherent videos at 720p/30fps.
Multi-modal support: From dialogue and reasoning to video and audio processing.
Extended context handling: Up to 128k tokens for complex, multi-document tasks.
Training innovations: Hyperparameter transfer, model-growth initialization, variance alignment, and router balancing.
Real-world applications: Deployed across Meituan's services including AI coding, meetings, and documentation tools.
Open-source ecosystem: Released under MIT License with support for model distillation and secondary development.

Selected Benchmarks

Representative results reported by the authors (non-exhaustive):

Category	Benchmark	Metric	LongCat-Flash
General Domains	MMLU	acc	89.71
Instruction Following	IFEval	acc	89.65
Math Reasoning	MATH500	acc	96.40
General Reasoning	DROP	F1	79.06
Coding	Humaneval+	pass@1	88.41
Agentic Tool Use	τ²-Bench (telecom)	avg@4	73.68

Values summarized from public reports; please consult the official resources for full details and conditions.

Latest Updates & Timeline

October 27, 2025 - LongCat-Video Launch: Introduction of the video generation model based on Diffusion Transformer architecture. Capable of generating 5-minute coherent videos at 720p/30fps, supporting text-to-video, image-to-video, and video continuation tasks. Emphasizes physical motion plausibility and long temporal consistency.
September 22, 2025 - LongCat-Flash-Thinking Release: Enhanced reasoning model focusing on agentic and formal reasoning capabilities. Features dual-path reasoning framework and DORA asynchronous training system. Achieves 64.5% token savings in tool-call scenarios (from 19,653 to 6,965 tokens in AIME25 tests).
September 1, 2025 - LongCat-Flash-Chat Open Source: Meituan officially released the foundation dialogue model (MoE, 560B params) with strong throughput and competitive accuracy. Includes Zero-Computation Experts maintaining ~27B active params/token. Prominent media coverage across Sina Finance, tech platforms.
Training Innovations & Scale: Successfully trained on >20T tokens in ~30 days across large GPU clusters. Training stability achieved through variance alignment, hyperparameter transfer, and router balancing. Successfully demonstrated training path on domestic acceleration cards.
Real-World Deployment: Meituan integrates LongCat models across internal tools for AI coding, intelligent meetings, and documentation systems, demonstrating practical business applications.
Benchmark Performance: Excellent results on MMLU, CEval, terminal command benchmarks, and agentic tool-call evaluations, showcasing strong general and domain-specific capabilities.

These highlights synthesize recent public reporting. For precise details and evolving numbers, consult official releases and model cards.

Quick Start

LongCat-Flash uses a chat template defined in tokenizer_config.json. Examples:

First Turn

[Round 0] USER:{query} ASSISTANT:

With System Prompt

SYSTEM:{system_prompt} [Round 0] USER:{query} ASSISTANT:

Multi-Turn

SYSTEM:{system_prompt} [Round 0] USER:{q} ASSISTANT:{r} ... [Round N-1] USER:{q} ASSISTANT:{r} [Round N] USER:{q} ASSISTANT:

Tool Call Envelope

{tool_description}

## Messages
SYSTEM:{system_prompt} [Round 0] USER:{query} ASSISTANT:

<longcat_tool_call>{"name": <function-name>, "arguments": <args-dict>}</longcat_tool_call>

Deployment & Applications

LongCat models support various deployment scenarios with specialized optimizations:

Flash-Chat & Flash-Thinking

SGLang and vLLM adaptations enable high-throughput inference for LongCat-Flash models. Deployment guides cover environment setup, tensor parallelism, and inference configurations. Supports both single-user and multi-user scenarios with cost-efficient inference around $0.7 per 1M output tokens on H800 GPUs.

Video Generation

LongCat-Video provides unified interfaces for text-to-video, image-to-video, and video continuation tasks. Optimized for generating long-form videos (up to 5 minutes) with high temporal consistency and physical motion plausibility.

Real-World Integration

Meituan has deployed LongCat models across multiple internal services: AI-assisted coding tools, intelligent meeting transcription and summarization, automated documentation generation, and customer service automation. The models are optimized for meal delivery, travel booking, and other domain-specific use cases.

Official Links

Model Variants

LongCat-Flash-Chat: Dialogue model optimized for conversation and instruction following (Released Sept 1, 2025)
LongCat-Flash-Thinking: Enhanced reasoning model with dual-path framework (Released Sept 22, 2025)
LongCat-Video: Video generation model with DiT architecture (Released Oct 27, 2025)
LongCat-Audio-Codec: Audio processing module for speech LLMs

License & Usage

All LongCat models are released under the MIT License, allowing model distillation, fine-tuning, and secondary development. Evaluate and validate the models before use in sensitive or high-risk scenarios, and ensure compliance with applicable laws and regulations for your use case. The open-source ecosystem supports academic research, commercial applications, and community contributions.

Strategic Direction

Meituan positions LongCat as a core platform capability across three dimensions: "AI at work" (internal productivity tools), "AI in products" (customer-facing services), and "Building LLM" (infrastructure development). The expansion from dialogue to reasoning, video generation, and audio processing represents progress toward world modeling and physical world simulation capabilities.