LongCat Models

Comprehensive overview of all LongCat AI model variants and their capabilities.

Model Variants

LongCat-Flash-Lite

Released: 2026

Lightweight MoE model based on N-gram embedding expansion. 68.5B total parameters, activating ~2.9B–4.5B per inference. Optimized for agentic tool use and coding, with up to 256K context via YARN.

  • 256K context length (YARN)
  • 500–700 token/s on typical 4K-in/1K-out load (LongCat API)
  • Strong performance on agentic tool-use and coding benchmarks

Hugging Face | Tech Report

Learn More →

LongCat-Flash-Chat

Released: September 1, 2025

Foundation dialogue model with 560B parameters in a Mixture-of-Experts (MoE) architecture. Activates approximately 18.6B–31.3B parameters per token (averaging ~27B) through Zero-Computation Experts.

  • Supports up to 128K context length
  • Achieves 100+ tokens/s on H800 GPUs
  • Strong instruction following, reasoning, and coding
Learn More →

LongCat-Flash-Prover

Released: Latest

Purpose-built for mathematical formalization and theorem proving. It decomposes formal reasoning into Auto-Formalization, Sketching, and Proving. With TIR, it reaches 97.1% on MiniF2F-Test with a 72-attempt budget, setting open-source SOTA for prover models. Built on Lean4 for fully machine-verifiable proofs.

  • 97.1% on MiniF2F-Test (72-attempt budget)
  • 100% on Auto-Formalization (MiniF2F & ProofNet)
  • 46.7% on MathOlympiad-Bench, 41.5% on PutnamBench
  • Sketching improves accuracy by around 10% under equal compute budget

GitHub | Hugging Face | Tech Report

Learn More →

LongCat-Flash-Thinking

Latest: Flash-Thinking-2601 | Original: September 22, 2025

Enhanced reasoning model with open-source SOTA tool calling capabilities. Features revolutionary "Re-thinking Mode" with 8 parallel reasoning paths, dual-path reasoning framework, and DORA asynchronous training system.

  • Open-source SOTA on Agentic Tool Use, Agentic Search, and TIR benchmarks
  • Re-thinking Mode: 8 parallel reasoning paths for thorough decision-making
  • 64.5% token savings in tool-call scenarios
  • Outperforms Claude in complex random tool-calling tasks
  • Perfect score (100.0) on AIME-25, SOTA (86.8) on IMO-AnswerBench

Hugging Face | GitHub | Try Online

Learn More →

LongCat-Video

Released: October 27, 2025

Video generation model based on Diffusion Transformer (DiT) architecture. Unified support for text-to-video, image-to-video, and video continuation tasks.

  • Generates 5-minute coherent videos at 720p/30fps
  • Long temporal sequences and cross-frame consistency
  • Physical motion plausibility
Learn More →

LongCat-Video-Avatar

Released: November 2025

SOTA-level avatar video generation model built on LongCat-Video base. Achieves breakthrough improvements in realism, long-video stability, and identity consistency for virtual human generation.

  • Audio-Text-to-Video (AT2V) and Audio-Text-Image-to-Video (ATI2V)
  • Natural micro-movements during silent segments (blinking, breathing)
  • Cross-Chunk Latent Stitching for stable 5-minute+ video generation
  • Reference Skip Attention mechanism for identity consistency
  • SOTA performance on HDTF, CelebV-HQ, EMTD, and EvalTalker benchmarks
Learn More →

LongCat-Flash-Omni

Released: November 2025

First open-source real-time all-modality interaction model. Unifies text, image, audio, and video with a single end-to-end ScMoE backbone.

  • Open-source SOTA on Omni-Bench and WorldSense
  • Low-latency, streaming multi-modal IO
  • 128K context with multi-turn dialogue
Learn More →

LongCat-Image

Released: Latest | Parameters: 6B

Open-source AI image generation and editing model. Achieves open-source SOTA on image editing benchmarks (GEdit-Bench, ImgEdit-Bench) and leading performance in Chinese text rendering (ChineseWord: 90.7). Covers all 8,105 standard Chinese characters.

  • Image editing: Open-source SOTA (ImgEdit-Bench 4.50, GEdit-Bench 7.60/7.64)
  • Chinese text rendering: 90.7 on ChineseWord, covering all 8,105 characters
  • Text-to-image: GenEval 0.87, DPG-Bench 86.8
  • Available on LongCat Web and LongCat APP (24 templates, image-to-image)
  • Fully open-source: Hugging Face | GitHub
Learn More →

LongCat-Audio-Codec

Released: November 2025

Audio processing module providing low-bitrate, real-time streaming audio tokenization and detokenization for speech LLMs. 0.43–0.87 kbps, ~100ms latency. Enables efficient audio encoding and decoding for multi-modal pipelines.

  • Parallel semantic and acoustic token extraction
  • Super-resolution support (16k/24k)
  • Integrates with LongCat-Flash-Omni

Hugging Face | GitHub

Learn More →

Model Comparison

Model Parameters Key Feature Use Case
Flash-Lite 68.5B (MoE, sparse) N-gram embedding expansion Agentic tool use, coding, long-context analysis
Flash-Chat 560B (MoE) High-throughput dialogue General conversation, coding
Flash-Prover Theorem proving (Lean4) Math formalization, theorem proving
Flash-Thinking 560B (MoE) Enhanced reasoning Tool use, formal reasoning
Video DiT-based Video generation Text/image-to-video, continuation
Video-Avatar DiT-based (LongCat-Video base) Avatar video generation (SOTA) Audio/text/image-to-video, virtual human
Flash-Omni ScMoE All-modality Multi-modal interaction
Image 6B (MM-DiT+Single-DiT) Image generation & editing (Open-source SOTA) Text-to-image, image editing, Chinese text rendering
Audio-Codec Audio tokenizer & detokenizer Speech LLMs, streaming audio, real-time dialogue

Get Started

Choose a model to explore detailed documentation, benchmarks, and deployment guides: