Learning Disentangled Identifiers for Action-Customized Text-to-Image Generation

要約

この研究は、テキストから画像への (T2I) 生成における新しいタスク、つまりアクションのカスタマイズに焦点を当てています。
このタスクの目的は、限られたデータから共存するアクションを学習し、それを目に見えない人間や動物にまで一般化することです。
実験結果によると、既存の主題主導のカスタマイズ方法は、アクションの代表的な特徴を学習できず、外観を含むコンテキストの特徴からアクションを切り離すのに苦労していることがわかりました。
低レベルの特徴の優先性と高レベルの特徴のもつれを克服するために、サンプル画像からアクション固有の識別子を学習するための反転ベースのメソッド Action-Disentangled Identifier (ADI) を提案します。
ADI はまず、レイヤーごとの識別子トークンを導入することでセマンティック条件付け空間を拡張します。これにより、さまざまな機能間で反転を分散しながら表現の豊かさが向上します。
次に、アクションに依存しない機能の反転をブロックするために、ADI は構築されたサンプルトリプルから勾配不変性を抽出し、無関係なチャネルの更新をマスクします。
タスクを総合的に評価するために、さまざまなアクションを含む ActionBench を提示します。それぞれのアクションには、注意深く選択されたサンプルが伴います。
定量的結果と定性的結果の両方で、アクションに合わせてカスタマイズされた T2I 生成において、当社の ADI が既存のベースラインを上回るパフォーマンスを示しています。
私たちのプロジェクトページは https://adi-t2i.github.io/ADI にあります。

要約(オリジナル)

This study focuses on a novel task in text-to-image (T2I) generation, namely action customization. The objective of this task is to learn the co-existing action from limited data and generalize it to unseen humans or even animals. Experimental results show that existing subject-driven customization methods fail to learn the representative characteristics of actions and struggle in decoupling actions from context features, including appearance. To overcome the preference for low-level features and the entanglement of high-level features, we propose an inversion-based method Action-Disentangled Identifier (ADI) to learn action-specific identifiers from the exemplar images. ADI first expands the semantic conditioning space by introducing layer-wise identifier tokens, thereby increasing the representational richness while distributing the inversion across different features. Then, to block the inversion of action-agnostic features, ADI extracts the gradient invariance from the constructed sample triples and masks the updates of irrelevant channels. To comprehensively evaluate the task, we present an ActionBench that includes a variety of actions, each accompanied by meticulously selected samples. Both quantitative and qualitative results show that our ADI outperforms existing baselines in action-customized T2I generation. Our project page is at https://adi-t2i.github.io/ADI.

arxiv情報

著者	Siteng Huang,Biao Gong,Yutong Feng,Xi Chen,Yuqian Fu,Yu Liu,Donglin Wang
発行日	2024-05-10 08:01:10+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Learning Disentangled Identifiers for Action-Customized Text-to-Image Generation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー