Characterizing Massive Activations of Attention Mechanism in Graph Neural Networks

要約

グラフニューラルネットワーク (GNN) は、グラフ構造を使用してデータを効果的にモデル化するためにますます普及しています。
最近、アテンションメカニズムが GNN に統合され、複雑なパターンを捕捉する能力が向上しました。
この論文では、この統合の重要かつ未調査の結果、つまり注目層内での大規模なアクティベーション (MA) の出現を明らかにする最初の包括的な研究を紹介します。
さまざまなグラフ変換アーキテクチャのエッジ特徴に焦点を当てて、MA を検出および分析するための新しい方法を紹介します。
私たちの研究では、ZINC、TOX21、PROTEINS などのベンチマークデータセットを使用してさまざまな GNN モデルを評価しています。
主な貢献には、(1) GNN におけるアテンションメカニズムと MA 生成の間の直接的なリンクの確立、(2) 活性化率分布に基づく MA の堅牢な定義と検出方法の開発、(3) 明示的バイアス項 (EBT) の導入が含まれます。
潜在的な対抗策を検討し、MA の有無に基づいてモデルの堅牢性を評価するための敵対的フレームワークとしてそれを探索します。
私たちの調査結果は、GraphTransformer、GraphiT、SAN などのさまざまなアーキテクチャにわたる注意誘発型 MA の蔓延と影響を浮き彫りにしています。
この研究は、注意メカニズム、モデルアーキテクチャ、データセットの特性、MA の出現の間の複雑な相互作用を明らかにし、より堅牢で信頼性の高いグラフモデルを開発するための重要な洞察を提供します。

要約(オリジナル)

Graph Neural Networks (GNNs) have become increasingly popular for effectively modeling data with graph structures. Recently, attention mechanisms have been integrated into GNNs to improve their ability to capture complex patterns. This paper presents the first comprehensive study revealing a critical, unexplored consequence of this integration: the emergence of Massive Activations (MAs) within attention layers. We introduce a novel method for detecting and analyzing MAs, focusing on edge features in different graph transformer architectures. Our study assesses various GNN models using benchmark datasets, including ZINC, TOX21, and PROTEINS. Key contributions include (1) establishing the direct link between attention mechanisms and MAs generation in GNNs, (2) developing a robust definition and detection method for MAs based on activation ratio distributions, (3) introducing the Explicit Bias Term (EBT) as a potential countermeasure and exploring it as an adversarial framework to assess models robustness based on the presence or absence of MAs. Our findings highlight the prevalence and impact of attention-induced MAs across different architectures, such as GraphTransformer, GraphiT, and SAN. The study reveals the complex interplay between attention mechanisms, model architecture, dataset characteristics, and MAs emergence, providing crucial insights for developing more robust and reliable graph models.

arxiv情報

著者	Lorenzo Bini,Marco Sorbi,Stephane Marchand-Maillet
発行日	2024-09-05 12:19:07+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Characterizing Massive Activations of Attention Mechanism in Graph Neural Networks

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー