A Theoretical Study of (Hyper) Self-Attention through the Lens of Interactions: Representation, Training, Generalization

要約

自己関節は現代の神経アーキテクチャの中核要素として浮上していますが、その理論的基盤はとらえどころのないままです。
このホワイトペーパーでは、マルチエージェント強化学習のエージェントから遺伝的配列の対立遺伝子に至るまで、相互作用するエンティティのレンズを通して自己触たちを研究し、単一層の線形自己触たちが、分散シナリオを含むペアワイズ相互作用をキャプチャする機能を効率的に表現、学習、および一般化できることを示します。
我々の分析は、自己触たちは、トレーニング中に観察された相互作用パターンの多様性に関する最小限の仮定の下で相互の相互作用学習者として機能し、それによって多種多様な現実世界ドメインを含むことを明らかにしています。
さらに、自己関節が相互作用機能を学習し、人口分布と分散不足シナリオの両方で一般化することを実証する実験を通じて、理論的洞察を検証します。
当社の理論に基づいて、エンティティ間のさまざまな機能レベルの相互作用のカップリングを学習するために設計された新しいニューラルネットワークモジュールであるHyperfeatureattentionを紹介します。
さらに、ペアワイズ相互作用を超えて拡張して、三方、四方、または一般的なNウェイの相互作用などの多entity依存関係をキャプチャする新しいモジュールであるハイパーアテンションを提案します。

要約(オリジナル)

Self-attention has emerged as a core component of modern neural architectures, yet its theoretical underpinnings remain elusive. In this paper, we study self-attention through the lens of interacting entities, ranging from agents in multi-agent reinforcement learning to alleles in genetic sequences, and show that a single layer linear self-attention can efficiently represent, learn, and generalize functions capturing pairwise interactions, including out-of-distribution scenarios. Our analysis reveals that self-attention acts as a mutual interaction learner under minimal assumptions on the diversity of interaction patterns observed during training, thereby encompassing a wide variety of real-world domains. In addition, we validate our theoretical insights through experiments demonstrating that self-attention learns interaction functions and generalizes across both population distributions and out-of-distribution scenarios. Building on our theories, we introduce HyperFeatureAttention, a novel neural network module designed to learn couplings of different feature-level interactions between entities. Furthermore, we propose HyperAttention, a new module that extends beyond pairwise interactions to capture multi-entity dependencies, such as three-way, four-way, or general n-way interactions.

arxiv情報

著者	Muhammed Ustaomeroglu,Guannan Qu
発行日	2025-06-06 15:44:10+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

A Theoretical Study of (Hyper) Self-Attention through the Lens of Interactions: Representation, Training, Generalization

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー