Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction

要約

Ming-Lite-Uniは、新しく設計された統一された視覚ジェネレーターと、ビジョンと言語を統合するために調整されたネイティブのマルチモーダルオートルーレフモデルを備えたオープンソースマルチモーダルフレームワークを紹介します。
具体的には、このプロジェクトは、統合されたメタケリーとM2-OMNIフレームワークのオープンソースの実装を提供しながら、新しいマルチスケール学習可能なトークンとマルチスケール表現アライメント戦略を導入します。
固定MLLMと学習可能な拡散モデルを活用することにより、Ming-Lite-UNIにより、ネイティブマルチモーダルARモデルはテキストから画像の生成と命令ベースの画像編集タスクの両方を実行し、純粋な視覚的理解を超えて機能を拡大できます。
私たちの実験結果は、Ming-Lite-Uniの強力なパフォーマンスを示しており、そのインタラクティブなプロセスの印象的な流動性の性質を示しています。
すべてのコードとモデルの重みは、コミュニティ内でのさらなる調査を促進するためにオープンソーリングされています。
特に、この作業は、2025年3月25日に更新されたネイティブ画像生成とChatGPT-4Oなどの同時マルチモーダルAIマイルストーンと一致しています。
Ming-Lite-Uniはアルファステージにあり、すぐにさらに洗練されます。

要約(オリジナル)

We introduce Ming-Lite-Uni, an open-source multimodal framework featuring a newly designed unified visual generator and a native multimodal autoregressive model tailored for unifying vision and language. Specifically, this project provides an open-source implementation of the integrated MetaQueries and M2-omni framework, while introducing the novel multi-scale learnable tokens and multi-scale representation alignment strategy. By leveraging a fixed MLLM and a learnable diffusion model, Ming-Lite-Uni enables native multimodal AR models to perform both text-to-image generation and instruction based image editing tasks, expanding their capabilities beyond pure visual understanding. Our experimental results demonstrate the strong performance of Ming-Lite-Uni and illustrate the impressive fluid nature of its interactive process. All code and model weights are open-sourced to foster further exploration within the community. Notably, this work aligns with concurrent multimodal AI milestones – such as ChatGPT-4o with native image generation updated in March 25, 2025 – underscoring the broader significance of unified models like Ming-Lite-Uni on the path toward AGI. Ming-Lite-Uni is in alpha stage and will soon be further refined.

arxiv情報

著者	Inclusion AI,Biao Gong,Cheng Zou,Dandan Zheng,Hu Yu,Jingdong Chen,Jianxin Sun,Junbo Zhao,Jun Zhou,Kaixiang Ji,Lixiang Ru,Libin Wang,Qingpei Guo,Rui Liu,Weilong Chai,Xinyu Xiao,Ziyuan Huang
発行日	2025-05-07 14:48:36+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー