Lumina-Image 2.0: A Unified and Efficient Image Generative Framework

要約

Lumina-Image 2.0を紹介します。これは、以前の作業であるLumina-Nextと比較して大きな進歩を達成する高度なテキストからイメージから画像の生成フレームワークです。
Lumina-Image 2.0は、2つの重要な原則に基づいて構築されています。（1）統一 – テキストと画像のトークンを共同シーケンスとして扱い、自然なクロスモーダル相互作用を可能にし、シームレスなタスク拡張を可能にする統一アーキテクチャ（統一されたネクスト）を採用します。
また、高品質のキャプションは、セマンティックに適切に整合したテキストイメージトレーニングペアを提供できるため、T2I生成タスク用に特別に設計された統一されたキャプションシステム、Unicaped（UNICAP）を導入します。
UniCapは、包括的かつ正確なキャプションの生成、収束の加速、迅速な順守の強化に優れています。
（2）効率 – 提案されたモデルの効率を改善するために、マルチステージプログレッシブトレーニング戦略を開発し、画質を損なうことなく推論加速技術を導入します。
アカデミックベンチマークとパブリックテキストから画像のアリーナに関する広範な評価は、Lumina-Image 2.0が2.6Bのパラメーターのみでも強力なパフォーマンスを提供し、スケーラビリティと設計効率を強調していることを示しています。
https://github.com/alpha-vllm/lumina-image-2.0でトレーニングの詳細、コード、モデルをリリースしました。

要約(オリジナル)

We introduce Lumina-Image 2.0, an advanced text-to-image generation framework that achieves significant progress compared to previous work, Lumina-Next. Lumina-Image 2.0 is built upon two key principles: (1) Unification – it adopts a unified architecture (Unified Next-DiT) that treats text and image tokens as a joint sequence, enabling natural cross-modal interactions and allowing seamless task expansion. Besides, since high-quality captioners can provide semantically well-aligned text-image training pairs, we introduce a unified captioning system, Unified Captioner (UniCap), specifically designed for T2I generation tasks. UniCap excels at generating comprehensive and accurate captions, accelerating convergence and enhancing prompt adherence. (2) Efficiency – to improve the efficiency of our proposed model, we develop multi-stage progressive training strategies and introduce inference acceleration techniques without compromising image quality. Extensive evaluations on academic benchmarks and public text-to-image arenas show that Lumina-Image 2.0 delivers strong performances even with only 2.6B parameters, highlighting its scalability and design efficiency. We have released our training details, code, and models at https://github.com/Alpha-VLLM/Lumina-Image-2.0.

arxiv情報

著者	Qi Qin,Le Zhuo,Yi Xin,Ruoyi Du,Zhen Li,Bin Fu,Yiting Lu,Jiakang Yuan,Xinyue Li,Dongyang Liu,Xiangyang Zhu,Manyuan Zhang,Will Beddow,Erwann Millon,Victor Perez,Wenhai Wang,Conghui He,Bo Zhang,Xiaohong Liu,Hongsheng Li,Yu Qiao,Chang Xu,Peng Gao
発行日	2025-03-27 17:57:07+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Lumina-Image 2.0: A Unified and Efficient Image Generative Framework

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー