A Transformer with Stack Attention

要約

自然言語は（軽度に）文脈依存性があると考えられています。
トランスフォーマーは、非常に有能な大規模な言語モデルを支えているにもかかわらず、多くのコンテキストフリー言語タスクをモデル化することができません。
トランスフォーマーベースの言語モデルのモデリング能力におけるこの制限に対処するために、微分可能なスタックベースのアテンションメカニズムでモデルを強化することを提案します。
スタックベースのアテンションメカニズムは、トランスフォーマーベースの言語モデルに組み込むことができ、モデルの解釈可能性のレベルを高めます。
スタックベースのアテンションメカニズムを追加することで、トランスフォーマーがすべてではなく一部の決定論的なコンテキストフリー言語をモデル化できることを示します。

要約(オリジナル)

Natural languages are believed to be (mildly) context-sensitive. Despite underpinning remarkably capable large language models, transformers are unable to model many context-free language tasks. In an attempt to address this limitation in the modeling power of transformer-based language models, we propose augmenting them with a differentiable, stack-based attention mechanism. Our stack-based attention mechanism can be incorporated into any transformer-based language model and adds a level of interpretability to the model. We show that the addition of our stack-based attention mechanism enables the transformer to model some, but not all, deterministic context-free languages.

arxiv情報

著者	Jiaoda Li,Jennifer C. White,Mrinmaya Sachan,Ryan Cotterell
発行日	2024-05-07 17:47:57+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

A Transformer with Stack Attention

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー