LongCat-Image Released: Open-Source SOTA Image Generation & Editing Model
Meituan LongCat officially releases and open-sources LongCat-Image, a 6B parameter AI image generation and editing model. Through high-performance architecture design, systematic training strategies, and data engineering, it achieves performance comparable to larger models, providing developers and industry with a "high-performance, low-threshold, fully open" solution.
Open-Source SOTA Performance
- Image editing: ImgEdit-Bench 4.50, GEdit-Bench Chinese/English 7.60/7.64 (open-source SOTA, approaching top closed-source models)
- Chinese text rendering: ChineseWord 90.7 (significantly leading all evaluated models), covering all 8,105 standard Chinese characters
- Text-to-image: GenEval 0.87, DPG-Bench 86.8 (competitive with top open-source and closed-source models)
- Subjective evaluation: Excellent realism in text-to-image (MOS); significantly outperforms other open-source solutions in image editing (SBS)
Key Technical Highlights
- Unified architecture: MM-DiT + Single-DiT hybrid backbone with progressive learning strategies; text-to-image and image editing mutually assist
- Highly controllable image editing: Multi-task joint learning mechanism, instruction editing and text-to-image training, achieving precise instruction following, generalization, and visual consistency
- Comprehensive Chinese text coverage: Curriculum learning strategy covering 8,105 standard Chinese characters; character-level encoding reduces memory burden; supports complex stroke structures and rare characters
- Texture and realism enhancement: Systematic data filtering and adversarial training framework; AIGC content detector as reward model for realistic physical textures, lighting, and quality
Applications & Resources
- LongCat APP: Text-to-image, image-to-image, 24 zero-threshold templates (poster design, portrait refinement, scene transformation)
- LongCat Web: Available at longcat.ai
- Fully open-source: Multi-stage models (Mid-training, Post-training) and image editing models available on Hugging Face and GitHub