Attention as a Hypernetwork

要約

トランスは、状況によっては、トレーニング中に構成要素が遭遇したかもしれないが、組成物がない新しい問題インスタンスに一般化することができます。
組成の一般化のこの能力の根底にあるメカニズムは何ですか？
マルチヘッドの注意をハイパーネットワークとして再定式化することにより、構成可能で低次元の潜在コードがキークエリ固有の操作を指定することを明らかにします。
この潜在コードは、ネットワークが目に見えないタスク構成で実行するサブタスクを予測しており、トレーニング中に取得した潜在コードが目に見えない問題インスタンスを解決するために再利用されることを明らかにしていることがわかります。
マルチヘッド注意の本質的なハイパーネットワークが組成の一般化をサポートするという仮説をさらに調べるために、ハイパーネットワークで生成された線形値ネットワークを作ることが組成性を強化するかどうかを和らげます。
この変更により、抽象的な推論タスクに関する構成一般化が改善されることがわかります。
特に、レイヴンのプログレッシブマトリックスヒューマンインテリジェンステストの象徴的なバージョンを導入します。これにより、トレーニングと評価中に遭遇する問題組成を正確に制御できます。
このタスクで、モデルサイズとデータのスケーリングが変圧器の構成一般化を可能にし、機能的に構造化された潜在空間を生み出す方法を示します。

要約(オリジナル)

Transformers can under some circumstances generalize to novel problem instances whose constituent parts might have been encountered during training, but whose compositions have not. What mechanisms underlie this ability for compositional generalization? By reformulating multi-head attention as a hypernetwork, we reveal that a composable, low-dimensional latent code specifies key-query specific operations. We find empirically that this latent code is predictive of the subtasks the network performs on unseen task compositions, revealing that latent codes acquired during training are reused to solve unseen problem instances. To further examine the hypothesis that the intrinsic hypernetwork of multi-head attention supports compositional generalization, we ablate whether making the hypernetwork-generated linear value network nonlinear strengthens compositionality. We find that this modification improves compositional generalization on abstract reasoning tasks. In particular, we introduce a symbolic version of the Raven’s Progressive Matrices human intelligence test, which gives us precise control over the problem compositions encountered during training and evaluation. We demonstrate on this task how scaling model size and data enables compositional generalization in transformers and gives rise to a functionally structured latent space.

arxiv情報

著者	Simon Schug,Seijin Kobayashi,Yassir Akram,João Sacramento,Razvan Pascanu
発行日	2025-02-17 15:55:26+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Attention as a Hypernetwork

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー