News & Updates - LongCat AI

Latest Releases

LongCat-Flash-Thinking-2601 Released: Open-Source SOTA in Tool Calling Capabilities

Today, the Meituan LongCat team officially releases and open-sources LongCat-Flash-Thinking-2601. As an upgraded version of the previously released LongCat-Flash-Thinking model, LongCat-Flash-Thinking-2601 achieves open-source SOTA performance on core evaluation benchmarks including Agentic Search, Agentic Tool Use, and TIR (Tool Interaction Reasoning).

The model demonstrates exceptional generalization capabilities in tool calling, outperforming Claude in random complex tasks that rely on tool calling, significantly reducing the training cost for adapting to new tools in real-world scenarios. It is also the first fully open-source model that supports online free experience of the "Re-thinking Mode", simultaneously activating 8 parallel reasoning paths to ensure thorough thinking and reliable decision-making.

This feature is now available for free experience on https://longcat.ai (the Re-thinking Mode is triggered when selecting the deep thinking function).

🧠 Revolutionary "Re-thinking" Mode

The newly upgraded "Re-thinking" mode teaches the model to "think carefully" before acting. When encountering high-difficulty problems, the model breaks down the thinking process into two steps: parallel thinking and summary synthesis.

Parallel thinking phase: The model simultaneously and independently explores multiple reasoning paths, similar to how humans consider different solutions when facing difficult problems, ensuring diversity of thought to avoid missing optimal solutions
Summary synthesis phase: Multiple paths are organized, optimized, and synthesized, with optimized results fed back to form closed-loop iterative reasoning, continuously deepening the thinking process
Reinforcement learning enhancement: Additional reinforcement learning components specifically designed to refine the model's summary synthesis capabilities, enabling LongCat-Flash-Thinking-2601 to truly "think clearly before acting"

📊 Comprehensive Benchmark Performance

Comprehensive and rigorous evaluation shows that LongCat-Flash-Thinking-2601 leads across programming, mathematical reasoning, agentic tool calling, and agentic search dimensions:

Programming capability: Achieves 82.8 on LCB benchmark and 47.7 on OIBench EN, ranking in the first tier of similar models, demonstrating solid code foundation capabilities
Mathematical reasoning: Outstanding performance with Re-thinking mode enabled, achieving 100.0 (perfect score) on AIME-25 and 86.8 (current SOTA) on IMO-AnswerBench
Agentic tool calling: Scores 88.2 on τ²-Bench and 29.3 on VitaBench, both achieving open-source SOTA, demonstrating excellent performance in multi-domain tool calling scenarios, meeting practical application needs
Agentic search: Achieves 73.1 on BrowseComp (best among all models) and 79.5 on RW Search, demonstrating strong information retrieval and scenario adaptation capabilities, reaching open-source leading levels

🔬 Generalization Testing

To better test the generalization capabilities of agentic models, we propose a novel evaluation method—an automated task synthesis pipeline that supports users to randomly generate complex tasks for arbitrary scenarios based on given keywords. Each generated task is equipped with corresponding tool sets and executable environments.

Since tool configurations in such environments are highly random, we measure generalization capabilities by evaluating model performance in these environments. Experimental results show that LongCat-Flash-Thinking-2601 maintains leading performance in the vast majority of tasks, confirming its powerful generalization capabilities in agentic scenarios.

🏋️ Environment Expansion: Building Large-Scale Training Grounds

Environment expansion is the core foundation for models to acquire universal agent capabilities. To truly master practical task execution, models must break free from pure text training limitations and practice in interactive environments that simulate real scenarios.

Facing the pain points of high real-world scenario replication costs and low iteration efficiency, the LongCat team built an end-to-end automated environment generation system, creating large-scale training environments covering 20+ domains with tens of thousands of scenarios. The system features efficient intelligent generation capabilities: inputting concise "domain definitions" can complete full-chain environment construction, automatically synthesizing executable environment graphs containing 60+ tools with complex dependencies, and simultaneously generating supporting database architectures, tool calling interfaces, and validation logic.

To ensure task solvability and effective training signals, the team innovated a "Solvable Path Priority" environment construction strategy:

Seed Sampling: Randomly sample a long tool calling chain as an anchor, automatically constructing a complex task that adopts this tool calling chain as one solution
Controlled Expansion: Using the "golden tool chain" as root, generate a maximum environment subgraph through BFS-style expansion, strictly guaranteeing database logical consistency
Dynamic Environment Construction: System dynamically decides whether to add new "golden tool chains" based on complexity and available paths
Minimum Scale Guarantee: Ensures sufficient tool diversity while maintaining database state consistency

⚙️ Enhanced DORA: Asynchronous Multi-Environment Training

Traditional agents are mostly trained in only a few simple simulated environments, like soldiers only practicing on shooting ranges, failing when they reach the real "battlefield." To support large-scale multi-environment training, the LongCat team upgraded the asynchronous training system DORA.

DORA's core breakthrough lies in a fully asynchronous streaming training architecture, revolutionizing traditional synchronous training:

Multi-version model parallel exploration: Training experiences generated on demand, directly stored in sample queues. Trainers can start training without waiting for all tasks to complete, completely eliminating inter-task waiting time
Distributed scheduling architecture: "Lightweight Rollout Manager + multiple Rollout Controllers" distributed mode, processing environment interactions through data parallelism, solving single-machine scheduling bottlenecks
Flexible environment deployment: Extending PyTorch RPC framework, supporting remote function calls based on CPU idle state, enabling flexible deployment of massive environments to any idle machine
Prefill-Decode (PD) Decoupling: Deploying prefill and decode tasks on different device groups, avoiding interference and ensuring generation efficiency
KV-cache Swap Mechanism: Chunk-level aggregation transmission and CPU-resident dynamic swapping, completely solving repeated computation problems caused by insufficient device memory

The system achieves 2-4x efficiency compared to traditional synchronous training, supports stable training of 1000+ steps, enabling models to continuously learn and steadily improve in tens of thousands of heterogeneous environments. Through balanced allocation of multi-environment tasks and intelligent distribution of computing resources based on difficulty and training progress, we maximize training efficiency and resource utilization.

🛡️ Noise Robustness Training: Real-World Resilience

Real-world environments have inherent imperfections—tools may randomly fail due to network issues, return incomplete results; user instructions may be ambiguous or inconsistent; data transmission may have errors. These noises cause models trained only in idealized perfect environments to "fail to adapt" when deployed to real scenarios, with performance significantly declining.

The team systematically decomposed and modeled real-world noise, identifying two core noise sources:

Tool Noise: Including tool execution failures (e.g., call timeouts, insufficient permissions), incomplete return results (e.g., missing data fields), inconsistent response formats (e.g., sometimes returning JSON, sometimes text)
Instruction Noise: Covering user expression ambiguity (e.g., unclear task objectives), redundant instruction information (e.g., containing irrelevant interference content), dynamic requirement changes (e.g., adjusting task parameters mid-way)

To enable models to gradually adapt to noise, the team adopted a curriculum learning injection strategy: training initially injects mild perturbations. After models show sufficient stability at current noise levels, gradually increase noise complexity and interference intensity, forming robust decision-making patterns. At the training execution level, noise injection is deeply integrated with multi-environment training across 20+ domains and tens of thousands of environments.

Models without robust training in noisy environments show significant performance degradation, and Claude cannot adapt to all noise types. Through this systematic anti-interference training, LongCat-Flash-Thinking-2601 has gained powerful environmental adaptation capabilities, performing well and efficiently completing tasks even in complex and non-ideal scenarios.

🧠 Re-thinking Mode: Width + Depth Dual Expansion

On particularly complex tasks, models sometimes get stuck—following one line of thought to the end, even if that path might be wrong. This is similar to how humans need to consider different possibilities when encountering difficult problems.

The core of "Re-thinking" mode is "Width + Depth" dual expansion: first, let the model simultaneously generate multiple reasoning paths, exploring different solutions; then use a specialized summary model to analyze, filter, and extract optimal ideas from these paths. Moreover, through reinforcement learning, the model learns to integrate intermediate results, continuously improving the reasoning process.

In actual testing, whether in long-chain reasoning, tool-integrated reasoning, or complete agent tool use scenarios, "Re-thinking" mode is particularly effective. As test-time computation budget increases, its performance advantages become increasingly apparent, significantly outperforming strategies that only expand reasoning depth or width.

🔗 Zigzag Attention: Ultra-Long Context Support

Traditional full attention mechanisms' quadratic computation complexity limits their support for million-token contexts, while existing sparse attention solutions often require complete retraining at high costs.

The LongCat team's proposed Zigzag Attention mechanism innovatively combines two sparse attention patterns: MLA (Multi-head Latent Attention) and SSA (Streaming Sparse Attention). The mechanism adopts a hierarchical design, alternately using these two sparse attention variants in different layers, avoiding common computation imbalance issues in traditional sparse attention, achieving higher hardware utilization.

For each query token, attention is limited to: Local Window (the most recent W tokens for short-term dependencies) and Global Anchors (the first B tokens of the sequence for long-term memory). This design significantly reduces computation and memory complexity while maintaining model perception of short- and long-term contexts.

Zigzag attention is introduced in mid-training stages, efficiently converting original full-attention models to sparse variants through structured sparsification processes with extremely low conversion overhead. The optimized model supports up to 1 million token context length, providing feasible solutions for ultra-long sequence processing.

The team simultaneously open-sources the model adapted to this mechanism: LongCat-Flash-Thinking-ZigZag (Hugging Face), fully inheriting LongCat-Flash-Thinking-2601's core capabilities while possessing ultra-long context processing advantages, providing developers with ready-to-use long-sequence solutions.

📊 Benchmark Results Summary

LongCat-Flash-Thinking-2601 demonstrates outstanding performance across multiple benchmark tests: achieving top-tier open-source levels on BrowseComp, τ²-Bench, and VitaBench, and even approaching closed-source top models in some tasks. The model also demonstrates strong generalization capabilities, performing excellently in unseen random tool combinations and tasks, mastering "meta-abilities for problem-solving." On test sets injected with real noise, performance significantly surpasses other models, validating the effectiveness of active noise training.

Through deep synergy of algorithms and engineering, automated environment construction reduces adaptation costs, the DORA system improves training efficiency by 2-4x, and Heavy Thinking mode amplifies complex task processing capabilities, forming an efficient and scalable training system.

📦 Resources & Access

To lower the barrier for developers, the Meituan LongCat team simultaneously opens model weights, inference code, and online experience capabilities, supporting full-process needs from quick trials to deep development:

Open-source platforms:
- GitHub: https://github.com/meituan-longcat/LongCat-Flash-Thinking-2601
- Hugging Face: https://huggingface.co/meituan-longcat/LongCat-Flash-Thinking-2601
- ModelScope: https://www.modelscope.cn/models/meituan-longcat/LongCat-Flash-Thinking-2601
Online experience & API:
- Official website: https://longcat.ai
- API platform: https://longcat.chat/platform/usage

LongCat-Flash-Thinking-2601, through environment expansion and noise training, significantly reduces agents' dependence on vertical scenarios, setting a new reference standard for open-source models' generalization capabilities in real-world tasks. We believe that truly universal agents should not be greenhouse bonsai, but trees that can take root in the real world's storms.

The release of LongCat-Flash-Thinking-2601 is a solid step toward this goal. Open source is a seed we plant, and we look forward to working with the entire community to sail toward a vast future in this starry sea called "agents."

We welcome developers to download, deploy, and experience LongCat-Flash-Thinking-2601, and also welcome you to apply for free API call quotas on the LongCat API platform. If you have collaboration ideas or feedback in areas such as agentic development and large model inference optimization, we look forward to communicating with you.

Learn More → | Technical Deep Dive →

LongCat-Video-Avatar Released: SOTA-Level Avatar Video Generation Model

Following the successful releases of InfiniteTalk and LongCat-Video, the LongCat team officially releases and open-sources LongCat-Video-Avatar, a SOTA-level avatar video generation model. Built on the LongCat-Video base, it achieves breakthrough improvements in three key dimensions: realistic motion, long-video stability, and identity consistency, providing developers with a more stable, efficient, and practical solution for virtual human generation.

Open-Source SOTA Realism

Unlike traditional virtual humans where only the mouth moves, LongCat-Video-Avatar acts as a complete director, synchronously controlling lip sync, eye movements, facial expressions, and body gestures to achieve rich and full emotional expression. Through Disentangled Unconditional Guidance, the model maintains natural micro-movements like blinking, breathing, and posture adjustments during silent segments, making virtual humans truly "perform."

Full-body synchronization: Synchronous control of lip sync, eye movements, facial expressions, and body gestures
Natural micro-movements: Blinking, breathing, and posture adjustments during silent segments
Multi-mode support: Audio-Text-to-Video (AT2V), Audio-Text-Image-to-Video (ATI2V), and video continuation

Long-Sequence High-Quality Generation

LongCat-Video-Avatar proposes the Cross-Chunk Latent Stitching training strategy to fundamentally solve visual quality degradation in long video generation. By performing feature replacement directly in the latent space, the model seamlessly connects context without the quality loss from repeated VAE encoding-decoding cycles. Experiments show that LongCat-Video-Avatar maintains stable colors and clear details even when generating 5-minute videos with approximately 5,000 frames.

5-minute+ stable generation: Maintains stable colors and clear details for videos with ~5,000 frames
No quality degradation: Eliminates VAE cycle-induced quality loss
Improved inference efficiency: Direct latent space operations without pixel domain decoding

Commercial-Grade Identity Consistency

LongCat-Video-Avatar upgrades systematically through base model migration to LongCat-Video and innovative reference mechanisms. The Reference Skip Attention mechanism ensures identity consistency while effectively suppressing action repetition and rigidity, making long videos both stable and varied.

Identity consistency: Maintains consistent character appearance throughout long sequences
Flexible reference control: Users can specify reference frame insertion positions via RoPE indices
Motion diversity: Avoids "copy-paste" effects and rigid movements

Benchmark Performance

Quantitative evaluation on authoritative public datasets (HDTF, CelebV-HQ, EMTD, EvalTalker) shows that LongCat-Video-Avatar achieves SOTA-leading performance across multiple core metrics. In comprehensive subjective evaluation with 492 participants, covering commercial promotion, entertainment, news, daily life, and educational scenarios, LongCat-Video-Avatar's overall score leads many mainstream open-source and commercial models, including InfiniteTalk, HeyGen, and Kling Avatar 2.0.

Sync-c/Sync-D: SOTA performance on all datasets for lip-sync accuracy
Consistency metrics: Excellent performance on FID, FVD, and CSIM
Cross-language support: Excellent performance in both Chinese and English
Multi-scenario performance: Best performance in entertainment, daily life, and educational scenarios

Resources

GitHub: https://github.com/meituan-longcat/LongCat-Video
Hugging Face: https://huggingface.co/meituan-longcat/LongCat-Video-Avatar
Project Page: https://meigen-ai.github.io/LongCat-Video-Avatar/

Learn More → | Related: LongCat-Video Base Model →

LongCat-Image Released: Open-Source SOTA Image Generation & Editing Model

Meituan LongCat officially releases and open-sources LongCat-Image, a 6B parameter AI image generation and editing model. Through high-performance architecture design, systematic training strategies, and data engineering, it achieves performance comparable to larger models, providing developers and industry with a "high-performance, low-threshold, fully open" solution.

Open-Source SOTA Performance

Image editing: ImgEdit-Bench 4.50, GEdit-Bench Chinese/English 7.60/7.64 (open-source SOTA, approaching top closed-source models)
Chinese text rendering: ChineseWord 90.7 (significantly leading all evaluated models), covering all 8,105 standard Chinese characters
Text-to-image: GenEval 0.87, DPG-Bench 86.8 (competitive with top open-source and closed-source models)
Subjective evaluation: Excellent realism in text-to-image (MOS); significantly outperforms other open-source solutions in image editing (SBS)

Key Technical Highlights

Unified architecture: MM-DiT + Single-DiT hybrid backbone with progressive learning strategies; text-to-image and image editing mutually assist
Highly controllable image editing: Multi-task joint learning mechanism, instruction editing and text-to-image training, achieving precise instruction following, generalization, and visual consistency
Comprehensive Chinese text coverage: Curriculum learning strategy covering 8,105 standard Chinese characters; character-level encoding reduces memory burden; supports complex stroke structures and rare characters
Texture and realism enhancement: Systematic data filtering and adversarial training framework; AIGC content detector as reward model for realistic physical textures, lighting, and quality

Applications & Resources

LongCat APP: Text-to-image, image-to-image, 24 zero-threshold templates (poster design, portrait refinement, scene transformation)
LongCat Web: Available at longcat.ai
Fully open-source: Multi-stage models (Mid-training, Post-training) and image editing models available on Hugging Face and GitHub

Learn More → | View Benchmarks →

UNO-Bench Released: One-Stop All-Modality Benchmark

The LongCat team announces UNO-Bench, a unified benchmark with strong Chinese support that evaluates single-modality and omni-modality intelligence in one framework. It reveals the Combination Law — weaker models show bottlenecks while stronger models achieve synergistic gains.

1,250 omni samples + 2,480 single-modality samples; 98% require cross-modal fusion
Open-ended multi-step (MO) questions with weighted scoring; automatic scoring model reaches 95% accuracy
Audio-visual decoupling and ablations to prevent shortcuts and enforce real fusion
Cluster-guided sampling reduces compute by >90% with rank consistency

Combination Law (power-law): POmni ≈ 1.0332 · (PA × PV)^2.1918 + 0.2422

November 2025 - LongCat-Flash-Omni Launch

First open-source real-time all-modality interaction model. Omni unifies text, image, audio, and video with a single end-to-end ScMoE backbone, achieving open-source SOTA on Omni-Bench and WorldSense.

Learn More →

October 27, 2025 - LongCat-Video Launch

Introduction of the video generation model based on Diffusion Transformer architecture. Capable of generating 5-minute coherent videos at 720p/30fps, supporting text-to-video, image-to-video, and video continuation tasks.

Learn More →

September 22, 2025 - LongCat-Flash-Thinking Release

Enhanced reasoning model focusing on agentic and formal reasoning capabilities. Features dual-path reasoning framework and DORA asynchronous training system. Achieves 64.5% token savings in tool-call scenarios.

Learn More →

September 1, 2025 - LongCat-Flash-Chat Open Source

Meituan officially released the foundation dialogue model (MoE, 560B params) with strong throughput and competitive accuracy. Includes Zero-Computation Experts maintaining ~27B active params/token.

Learn More →

Timeline

Date	Release	Keyword
Latest	Flash-Thinking-2601	SOTA Tool Use
Latest	Image	SOTA
Nov 2025	Video-Avatar	Realistic
Nov 2025	Flash-Omni	All
Oct 27, 2025	Video	Long
Sep 22, 2025	Flash-Thinking	Stable
Sep 1, 2025	Flash-Chat	Fast

Technical Achievements

Training Scale: Successfully trained on >20T tokens in ~30 days
Training Stability: Achieved through variance alignment, hyperparameter transfer, and router balancing
Domestic Acceleration: Successfully demonstrated training path on domestic acceleration cards
Benchmark Performance: Excellent results on MMLU, CEval, terminal command benchmarks, and agentic tool-call evaluations