LongCat AI - Fast Open-Source MoE LLM

About the Model

LongCat-Flash is a powerful language model with a total parameter count of 560B under a Mixture-of-Experts (MoE) architecture. It activates approximately 18.6B–31.3B parameters per token (averaging ~27B) based on context, enabling competitive quality with high throughput and low latency. The model supports long context lengths (up to 128k) and demonstrates strong instruction following, reasoning, and coding capabilities—especially in agentic tool-use scenarios.

This page summarizes highlights from the official resources and presents a concise overview for new users.

Key Features

Dynamic MoE routing: Zero-computation experts and PID-controlled expert bias maintain ~27B activated params/token.
Shortcut-connected MoE (ScMoE): Overlaps computation and communication for speed at scale.
High throughput inference: Optimizations enable 100+ tokens per second (hardware dependent).
Robust scaling & stability: Hyperparameter transfer, model-growth initialization, and stability suite (router balancing, z-loss, tuned optimizers).
Deterministic training: Exact reproducibility helps detect silent data corruption (SDC).
Agentic capability pipeline: Multi-stage post-training focused on complex tasks with tools and iterative reasoning.
Extended context: Up to 128k tokens for information-dense, multi-document tasks.
Open-source: Released under the MIT License.

Selected Benchmarks

Representative results reported by the authors (non-exhaustive):

Category	Benchmark	Metric	LongCat-Flash
General Domains	MMLU	acc	89.71
Instruction Following	IFEval	acc	89.65
Math Reasoning	MATH500	acc	96.40
General Reasoning	DROP	F1	79.06
Coding	Humaneval+	pass@1	88.41
Agentic Tool Use	τ²-Bench (telecom)	avg@4	73.68

Values summarized from public reports; please consult the official resources for full details and conditions.

Quick Start

LongCat-Flash uses a chat template defined in tokenizer_config.json. Examples:

First Turn

[Round 0] USER:{query} ASSISTANT:

With System Prompt

SYSTEM:{system_prompt} [Round 0] USER:{query} ASSISTANT:

Multi-Turn

SYSTEM:{system_prompt} [Round 0] USER:{q} ASSISTANT:{r} ... [Round N-1] USER:{q} ASSISTANT:{r} [Round N] USER:{q} ASSISTANT:

Tool Call Envelope

{tool_description}

## Messages
SYSTEM:{system_prompt} [Round 0] USER:{query} ASSISTANT:

<longcat_tool_call>{"name": <function-name>, "arguments": <args-dict>}</longcat_tool_call>

Deployment

The authors provide adaptations for SGLang and vLLM to deploy LongCat-Flash with high throughput. Refer to the Deployment Guide in the official repository for environment setup, tensor parallelism, and inference settings.

Official Links

License & Usage

LongCat-Flash-Chat is released under the MIT License. Evaluate and validate the model before use in sensitive or high-risk scenarios, and ensure compliance with applicable laws and regulations for your use case.