Are queries and keys always relevant? A case study on Transformer wave functions

要約

ドット積アテンションメカニズムは、もともと自然言語処理タスク用に設計されたもので、現代の Transformers の基礎となっています。
クエリとキー間の類似性の重複を計算することで、文内の単語ペア間の意味関係を適切に捕捉します。
この研究では、量子多体スピンハミルトニアンの基底状態を近似するための変分波動関数のパラメータ化の特定の領域における、トランスフォーマーの注意メカニズムに焦点を当てて、トランスフォーマーの適合性を調査します。
具体的には、格子上の量子多体系の分野における一般的なベンチマークである2次元$J_1$-$J_2$ハイゼンベルグモデルで数値シミュレーションを実行します。
標準のアテンションメカニズムのパフォーマンスを、クエリとキーを除外し、位置のみに依存する簡易バージョンと比較することで、計算コストとパラメーターの使用量を削減しながら、競争力のある結果を達成します。
さらに、標準的なアテンションメカニズムによって生成されたアテンションマップの分析を通じて、最適化の終了時にアテンションの重みが事実上入力に依存しないことを示します。
私たちは数値結果を分析計算でサポートし、大規模システムを研究する際に原則としてクエリとキーをアテンションメカニズムから除外する必要がある理由について物理的な洞察を提供します。

要約(オリジナル)

The dot product attention mechanism, originally designed for natural language processing tasks, is a cornerstone of modern Transformers. It adeptly captures semantic relationships between word pairs in sentences by computing a similarity overlap between queries and keys. In this work, we explore the suitability of Transformers, focusing on their attention mechanisms, in the specific domain of the parametrization of variational wave functions to approximate ground states of quantum many-body spin Hamiltonians. Specifically, we perform numerical simulations on the two-dimensional $J_1$-$J_2$ Heisenberg model, a common benchmark in the field of quantum many-body systems on lattice. By comparing the performance of standard attention mechanisms with a simplified version that excludes queries and keys, relying solely on positions, we achieve competitive results while reducing computational cost and parameter usage. Furthermore, through the analysis of the attention maps generated by standard attention mechanisms, we show that the attention weights become effectively input-independent at the end of the optimization. We support the numerical results with analytical calculations, providing physical insights of why queries and keys should be, in principle, omitted from the attention mechanism when studying large systems.

arxiv情報

著者	Riccardo Rende,Luciano Loris Viteritti
発行日	2025-01-13 15:23:47+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Are queries and keys always relevant? A case study on Transformer wave functions

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー