LongCat-2.0

1.6T open-source MoE for agentic coding — native 1M context, trained on 50K domestic compute (Released: June 30, 2026)

Overview

LongCat-2.0 is Meituan LongCat's next-generation trillion-parameter open-source model, purpose-built for agentic coding — code understanding, generation, and execution in real-world agent workflows. With 1.6T total parameters and dynamic activation of 33B–56B (averaging ~48B) per token, it is the industry's first trillion-parameter model to complete full training and inference on a 50,000-card domestic compute cluster.

Pretrained from scratch on 30T+ tokens spanning Chinese, English, multilingual, and code data, LongCat-2.0 natively supports 1 million token context. A preview version has been available on OpenRouter and longcat.ai, ranking among the top three models globally by call volume on OpenRouter.

Model weights and self-hosted deployment guides will be linked here when officially published on Hugging Face / GitHub.

Architecture

LongCat Sparse Attention (LSA) — 1M Context

Traditional models begin to "forget" content beyond ~100K tokens. LSA intelligently selects key information instead of attending to every token, reducing computation from quadratic to linear complexity. This enables precise information retrieval and understanding across 1 million tokens — letting agents see an entire codebase at once.

Zero-Computation Experts + ScMoE

Code tasks vary enormously in complexity — naming a variable vs. deriving a recursive algorithm require vastly different compute. Zero-computation experts enable token-level dynamic activation: simple tokens consume no compute, while complex tokens automatically receive more resources (33B–56B range).

MOPD Multi-Expert Fusion

LongCat-2.0 fuses three expert groups through MOPD (Multi-Teacher On-Policy Distill). Training starts from a LongCat SFT checkpoint, branches into specialized experts, and distills their capabilities into one unified model on domestic accelerator clusters:

  • Agent Experts: Tool use, API parsing, and self-correction
  • Reasoning Experts: Multi-hop reasoning, STEM reasoning, and adaptive computation
  • Interaction Experts: Instruction following, human alignment, and hallucination suppression
LongCat-2.0 MOPD architecture diagram: Multi-Teacher On-Policy Distillation from Agent, Reasoning, and Interaction experts on domestic compute clusters to a unified LongCat-2.0 model
MOPD training pipeline — from LongCat SFT through Agent, Reasoning, and Interaction experts to unified LongCat-2.0. Real-world scenarios integrate agentic execution, reasoning, and interaction; domain experts are fused on domestic accelerator clusters.

At inference time, a gating network dynamically routes each task to the most capable experts, rather than simply merging parameters.

Benchmark Performance

Benchmark LongCat-2.0 Focus
SWE-bench Pro 59.5 Deep software engineering (leads Gemini 3.1 Pro 54.2, GPT-5.5 58.6, Claude Opus 4.6 57.3)
SWE-bench Multilingual 77.3 Multilingual coding (on par with Claude Opus 4.6 77.8)
Terminal-Bench 2.1 70.8 Real terminal command interaction
RWSearch 78.8 Search agent tasks
FORTE 73.2 Productivity scenarios
BrowseComp 79.9 Complex browsing and retrieval

Real-World Use Cases

  • Agent building: End-to-end AI SQL Agent — natural language queries, step planning, and business insights
  • Codebase migration: Analyze legacy code, map to new SDK APIs, refactor with full feature parity and compile-first-pass
  • Full app development: From one-sentence idea to runnable product with architecture, logic, and UI
  • 3D interactive demos: Single-file Three.js experiences from natural language descriptions
  • AI content pipelines: Multi-agent novel factory with million-token setting consistency

Training on Domestic Compute

LongCat's exploration of domestic compute began in 2023. Over three years, the team scaled from thousands to 50,000 cards, solving operator adaptation, communication optimization, and distributed stability challenges.

  • Stability: HCCL exception handling, elastic scaling, auto fault recovery — 70%+ reduction in monthly daily fault rate
  • Correctness: Deterministic operators, bitwise consistency verification, parameter detection
  • Efficiency: Pipeline scheduling, memory optimization, operator-level core control — 1.5× MFU improvement
  • Throughput: Steady-state daily throughput exceeding 1T tokens/day