On Provable Length and Compositional Generalization

要約

長さの一般化 — トレーニング中に見られたものよりも長いシーケンスに一般化する機能、および構成の一般化 — トレーニング中に見られないトークンの組み合わせに一般化する機能は、シーケンス間の分布における分布外一般化の重要な形式です。
モデル。
この研究では、ディープセット、トランスフォーマー、状態空間モデル、単純なリカレントニューラルネットなど、さまざまなアーキテクチャの証明可能な長さと構成の一般化に向けた最初の一歩を踏み出します。
アーキテクチャに応じて、長さと構成の一般化には、さまざまな程度の表現の識別、たとえば、グラウンドトゥルース表現との線形または順列関係が必要であることを証明します。

要約(オリジナル)

Length generalization — the ability to generalize to longer sequences than ones seen during training, and compositional generalization — the ability to generalize to token combinations not seen during training, are crucial forms of out-of-distribution generalization in sequence-to-sequence models. In this work, we take the first steps towards provable length and compositional generalization for a range of architectures, including deep sets, transformers, state space models, and simple recurrent neural nets. Depending on the architecture, we prove different degrees of representation identification, e.g., a linear or a permutation relation with ground truth representation, is necessary for length and compositional generalization.

arxiv情報

著者	Kartik Ahuja,Amin Mansouri
発行日	2024-02-07 14:16:28+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

On Provable Length and Compositional Generalization

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー