Average-Hard Attention Transformers are Constant-Depth Uniform Threshold Circuits

要約

トランスフォーマーは、さまざまな自然言語処理タスクに広く使用されるニューラルネットワークモデルとして登場しました。
以前の研究では、入力長に対する内部計算の平均ハードアテンションと対数精度という 2 つの仮定を立てて、定深度しきい値回路との関係を調査しました。
メリルら。
(2022) 平均ハードアテンショントランスフォーマーは、深さ一定の多項式サイズのしきい値回路によって認識できる言語のセットを表す複雑さクラス TC0 に該当する言語を認識することを証明しました。
同様に、Merrill と Sabharwal (2023) は、対数精度の変換器が均一 TC0 のクラス内の言語を認識することを示しています。
これは、両方の変圧器モデルが深さ一定のしきい値回路によってシミュレートできることを示しており、後者は均一な回路ファミリーを生成するためより堅牢です。
私たちの論文は、最初の結果を拡張して均一な回路も生成できることを示しています。

要約(オリジナル)

Transformers have emerged as a widely used neural network model for various natural language processing tasks. Previous research explored their relationship with constant-depth threshold circuits, making two assumptions: average-hard attention and logarithmic precision for internal computations relative to input length. Merrill et al. (2022) prove that average-hard attention transformers recognize languages that fall within the complexity class TC0, denoting the set of languages that can be recognized by constant-depth polynomial-size threshold circuits. Likewise, Merrill and Sabharwal (2023) show that log-precision transformers recognize languages within the class of uniform TC0. This shows that both transformer models can be simulated by constant-depth threshold circuits, with the latter being more robust due to generating a uniform circuit family. Our paper shows that the first result can be extended to yield uniform circuits as well.

arxiv情報

著者	Lena Strobl
発行日	2023-08-21 18:54:56+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Average-Hard Attention Transformers are Constant-Depth Uniform Threshold Circuits

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー