On the learning Dynamics of Attention Networks

要約

アテンションモデルは通常、ソフトアテンション、ハードアテンション、潜在変数周辺尤度 (LVML) アテンションというさまざまに呼ばれる 3 つの標準損失関数の 1 つを最適化することによって学習されます。
3 つのパラダイムはすべて、2 つのモデルを見つけるという同じ目標によって動機付けられています。1 つは入力の正しい \textit{segment} を「選択」する「焦点」モデル、もう 1 つは選択されたセグメントをターゲットラベルに処理する「分類」モデルです。
ただし、選択したセグメントが集約される方法が大きく異なり、その結果、異なるダイナミクスと最終結果が得られます。
我々は、これらのパラダイムを使用して学習されたモデルの固有の特徴を観察し、これを焦点モデルが固定されている場合の勾配降下下での分類モデルの進化の結果として説明します。
また、これらのパラダイムを簡単な設定で分析し、勾配流下でのパラメーター軌道の閉形式式を導出します。
ソフトな注意力の喪失により、フォーカスモデルは初期化時に急速に改善され、その後は不安定になります。
一方、重度の注意力喪失は逆の挙動を示します。
私たちの観察に基づいて、さまざまな損失関数の利点を組み合わせたシンプルなハイブリッドアプローチを提案し、それを半合成データセットと現実世界のデータセットのコレクションで実証します。

要約(オリジナル)

Attention models are typically learned by optimizing one of three standard loss functions that are variously called — soft attention, hard attention, and latent variable marginal likelihood (LVML) attention. All three paradigms are motivated by the same goal of finding two models — a `focus’ model that `selects’ the right \textit{segment} of the input and a `classification’ model that processes the selected segment into the target label. However, they differ significantly in the way the selected segments are aggregated, resulting in distinct dynamics and final results. We observe a unique signature of models learned using these paradigms and explain this as a consequence of the evolution of the classification model under gradient descent when the focus model is fixed. We also analyze these paradigms in a simple setting and derive closed-form expressions for the parameter trajectory under gradient flow. With the soft attention loss, the focus model improves quickly at initialization and splutters later on. On the other hand, hard attention loss behaves in the opposite fashion. Based on our observations, we propose a simple hybrid approach that combines the advantages of the different loss functions and demonstrates it on a collection of semi-synthetic and real-world datasets

arxiv情報

著者	Rahul Vashisht,Harish G. Ramaswamy
発行日	2023-07-25 11:40:47+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

On the learning Dynamics of Attention Networks

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー