Text-to-Model: Text-Conditioned Neural Network Diffusion for Train-Once-for-All Personalization

要約

生成人工知能（GENAI）は、テキストからテキストへの大規模な言語モデル、テキストからイメージから安定した拡散、テキストからビデオへのソラなど、さまざまなモダリティの人間言語から世界の知識を理解し、生成することに大きな進歩を遂げました。
この論文では、テキストからモデルの生成のためのGenaiの能力を調査して、GenaiがAI自体パラメーターに組み込まれたハイパーレベルの知識を理解できるかどうかを確認します。
具体的には、テキストプロンプトを使用して、多様なエンドユーザーとタスクのパーソナライズされたモデルを生成することを目指して、すべてのパーソナライズと呼ばれる実用的なシナリオを研究します。
ニューラルネットワークの拡散の最近の出現に触発された私たちは、Tinaを訓練するためのテキストコンディショナルニューラルネットワーク拡散であるTinaを提示します。
Tinaは、クリップモデルを使用して埋め込まれたタスクの説明を条件付けられた拡散トランスモデルを活用します。
潜在的なパーソナライズされたタスクの天文数（たとえば、$ 1.73 \ Times10^{13} $）にもかかわらず、ティナは、小さなデータセット（$ \ sim 1000 $）で訓練された顕著な分散型および分散型の一般化を示しています。
さらに、ゼロショット/少ないショット画像プロンプト、さまざまな数のパーソナライズされたクラス、自然言語の説明のプロンプト、および目に見えないエンティティの予測でその機能を分析することにより、\ Tinaが世界の知識を理解するかどうか、および方法を検証します。

要約(オリジナル)

Generative artificial intelligence (GenAI) has made significant progress in understanding world knowledge and generating content from human languages across various modalities, like text-to-text large language models, text-to-image stable diffusion, and text-to-video Sora. While in this paper, we investigate the capability of GenAI for text-to-model generation, to see whether GenAI can comprehend hyper-level knowledge embedded within AI itself parameters. Specifically, we study a practical scenario termed train-once-for-all personalization, aiming to generate personalized models for diverse end-users and tasks using text prompts. Inspired by the recent emergence of neural network diffusion, we present Tina, a text-conditioned neural network diffusion for train-once-for-all personalization. Tina leverages a diffusion transformer model conditioned on task descriptions embedded using a CLIP model. Despite the astronomical number of potential personalized tasks (e.g., $1.73\times10^{13}$), by our design, Tina demonstrates remarkable in-distribution and out-of-distribution generalization even trained on small datasets ($\sim 1000$). We further verify whether and how \Tina understands world knowledge by analyzing its capabilities under zero-shot/few-shot image prompts, different numbers of personalized classes, prompts of natural language descriptions, and predicting unseen entities.

arxiv情報

著者	Zexi Li,Lingzhi Gao,Chao Wu
発行日	2025-03-26 16:33:17+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Text-to-Model: Text-Conditioned Neural Network Diffusion for Train-Once-for-All Personalization

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー