LongCat AI

Next-generation multi-modal AI models by Meituan: Flash-Chat, Flash-Thinking, Video, and Audio-Codec. Open-source MoE LLMs with cutting-edge capabilities.

About the LongCat Models

The LongCat series represents Meituan's progressive evolution in multi-modal AI, with four specialized variants addressing different capabilities:

LongCat-Flash-Chat (September 1, 2025)

The foundation dialogue model with 560B parameters in a Mixture-of-Experts (MoE) architecture. It activates approximately 18.6B–31.3B parameters per token (averaging ~27B) through Zero-Computation Experts, achieving competitive quality with high throughput and low latency. Supports up to 128k context length and demonstrates strong instruction following, reasoning, and coding capabilities—especially in agentic tool-use scenarios. Achieves 100+ tokens/s on H800 GPUs.

LongCat-Flash-Thinking (September 22, 2025)

Enhanced reasoning model focusing on "Agentic Reasoning" and "Formal Reasoning". Features a dual-path reasoning framework combining agentic tool use with formal reasoning capabilities. Built with DORA (Dynamic Orchestration for Asynchronous rollout) for asynchronous large-scale training. Dramatically improved tool-call efficiency with 64.5% token savings in AIME25 tests (from 19,653 tokens down to 6,965).

LongCat-Video (October 27, 2025)

Video generation model based on Diffusion Transformer (DiT) architecture. Unified support for text-to-video, image-to-video, and video continuation tasks. Generates coherent 5-minute videos at 720p resolution and 30 fps, with emphasis on long temporal sequences, cross-frame consistency, and physical motion plausibility.

LongCat-Audio-Codec

Audio processing module providing low-bitrate, real-time streaming audio tokenization and detokenization for speech LLMs, enabling efficient audio encoding and decoding.

This page summarizes highlights from official resources and presents a concise overview for new users.

Key Features & Technologies

  • Zero-Computation Experts: Smart MoE routing mechanism that activates only 18.6B–31.3B parameters per token (averaging ~27B) from a 560B parameter pool, achieving cost efficiency.
  • Shortcut-connected MoE (ScMoE): Overlaps computation and communication for speed at scale, reducing latency.
  • High-throughput inference: Achieves 100+ tokens/s on H800 GPUs with optimized architecture and efficient routing.
  • DORA training system: Dynamic Orchestration for Asynchronous rollout enables efficient large-scale training across domains.
  • Dual-path reasoning framework: Combines agentic tool use with formal reasoning for enhanced problem-solving capabilities.
  • Enhanced tool-call efficiency: 64.5% token reduction in agentic scenarios, improving cost-effectiveness.
  • Video generation capabilities: Diffusion Transformer (DiT) architecture producing 5-minute coherent videos at 720p/30fps.
  • Multi-modal support: From dialogue and reasoning to video and audio processing.
  • Extended context handling: Up to 128k tokens for complex, multi-document tasks.
  • Training innovations: Hyperparameter transfer, model-growth initialization, variance alignment, and router balancing.
  • Real-world applications: Deployed across Meituan's services including AI coding, meetings, and documentation tools.
  • Open-source ecosystem: Released under MIT License with support for model distillation and secondary development.

Selected Benchmarks

Representative results reported by the authors (non-exhaustive):

Category Benchmark Metric LongCat-Flash
General Domains MMLU acc 89.71
Instruction Following IFEval acc 89.65
Math Reasoning MATH500 acc 96.40
General Reasoning DROP F1 79.06
Coding Humaneval+ pass@1 88.41
Agentic Tool Use τ²-Bench (telecom) avg@4 73.68

Values summarized from public reports; please consult the official resources for full details and conditions.

Latest Updates & Timeline

  • October 27, 2025 - LongCat-Video Launch: Introduction of the video generation model based on Diffusion Transformer architecture. Capable of generating 5-minute coherent videos at 720p/30fps, supporting text-to-video, image-to-video, and video continuation tasks. Emphasizes physical motion plausibility and long temporal consistency.
  • September 22, 2025 - LongCat-Flash-Thinking Release: Enhanced reasoning model focusing on agentic and formal reasoning capabilities. Features dual-path reasoning framework and DORA asynchronous training system. Achieves 64.5% token savings in tool-call scenarios (from 19,653 to 6,965 tokens in AIME25 tests).
  • September 1, 2025 - LongCat-Flash-Chat Open Source: Meituan officially released the foundation dialogue model (MoE, 560B params) with strong throughput and competitive accuracy. Includes Zero-Computation Experts maintaining ~27B active params/token. Prominent media coverage across Sina Finance, tech platforms.
  • Training Innovations & Scale: Successfully trained on >20T tokens in ~30 days across large GPU clusters. Training stability achieved through variance alignment, hyperparameter transfer, and router balancing. Successfully demonstrated training path on domestic acceleration cards.
  • Real-World Deployment: Meituan integrates LongCat models across internal tools for AI coding, intelligent meetings, and documentation systems, demonstrating practical business applications.
  • Benchmark Performance: Excellent results on MMLU, CEval, terminal command benchmarks, and agentic tool-call evaluations, showcasing strong general and domain-specific capabilities.

These highlights synthesize recent public reporting. For precise details and evolving numbers, consult official releases and model cards.

Quick Start

LongCat-Flash uses a chat template defined in tokenizer_config.json. Examples:

First Turn

[Round 0] USER:{query} ASSISTANT:

With System Prompt

SYSTEM:{system_prompt} [Round 0] USER:{query} ASSISTANT:

Multi-Turn

SYSTEM:{system_prompt} [Round 0] USER:{q} ASSISTANT:{r} ... [Round N-1] USER:{q} ASSISTANT:{r} [Round N] USER:{q} ASSISTANT:

Tool Call Envelope

{tool_description}

## Messages
SYSTEM:{system_prompt} [Round 0] USER:{query} ASSISTANT:

<longcat_tool_call>{"name": <function-name>, "arguments": <args-dict>}</longcat_tool_call>

Deployment & Applications

LongCat models support various deployment scenarios with specialized optimizations:

Flash-Chat & Flash-Thinking

SGLang and vLLM adaptations enable high-throughput inference for LongCat-Flash models. Deployment guides cover environment setup, tensor parallelism, and inference configurations. Supports both single-user and multi-user scenarios with cost-efficient inference around $0.7 per 1M output tokens on H800 GPUs.

Video Generation

LongCat-Video provides unified interfaces for text-to-video, image-to-video, and video continuation tasks. Optimized for generating long-form videos (up to 5 minutes) with high temporal consistency and physical motion plausibility.

Real-World Integration

Meituan has deployed LongCat models across multiple internal services: AI-assisted coding tools, intelligent meeting transcription and summarization, automated documentation generation, and customer service automation. The models are optimized for meal delivery, travel booking, and other domain-specific use cases.