ReCOGS: How Incidental Details of a Logical Form Overshadow an Evaluation of Semantic Interpretation

要約

合成一般化ベンチマークは、モデルが新しい文の意味を正確に計算できるかどうかを評価しようとしますが、これを論理形式 (LF) 予測の観点から操作可能にします。
これにより、選択された LF の意味的に無関係な詳細がモデルのパフォーマンスを形成する可能性があるという懸念が生じます。
この懸念は COGS ベンチマークで実現されていると主張します (Kim and Linzen, 2020)。
COGS は、現在のモデルでは不可能に見える一般化の分裂をもたらします。これは、これらのモデルの起訴と見なすことができます。
ただし、負の結果はCOGS LFの付随的な機能に起因することを示しています。
これらの LF を意味的に同等のものに変換し、意味解釈に関係のない機能を除外すると、ベースラインモデルでさえ牽引力を得ることがわかります。
COGS LF の最近の変数なしの翻訳は、同様の結論を示唆していますが、この形式は意味的に同等ではないことがわかります。
一部の COGS の意味を正確に表すことができません。
これらの調査結果は、非常に挑戦的でありながら、ターゲットのセマンティック機能の評価に近づく COGS の修正バージョンである ReCOGS の提案を通知します。
全体として、私たちの結果は、構成の一般化と慎重なベンチマークタスクの設計の重要性を再確認します。

要約(オリジナル)

Compositional generalization benchmarks seek to assess whether models can accurately compute meanings for novel sentences, but operationalize this in terms of logical form (LF) prediction. This raises the concern that semantically irrelevant details of the chosen LFs could shape model performance. We argue that this concern is realized for the COGS benchmark (Kim and Linzen, 2020). COGS poses generalization splits that appear impossible for present-day models, which could be taken as an indictment of those models. However, we show that the negative results trace to incidental features of COGS LFs. Converting these LFs to semantically equivalent ones and factoring out capabilities unrelated to semantic interpretation, we find that even baseline models get traction. A recent variable-free translation of COGS LFs suggests similar conclusions, but we observe this format is not semantically equivalent; it is incapable of accurately representing some COGS meanings. These findings inform our proposal for ReCOGS, a modified version of COGS that comes closer to assessing the target semantic capabilities while remaining very challenging. Overall, our results reaffirm the importance of compositional generalization and careful benchmark task design.

arxiv情報

著者	Zhengxuan Wu,Christopher D. Manning,Christopher Potts
発行日	2023-03-24 00:01:24+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

ReCOGS: How Incidental Details of a Logical Form Overshadow an Evaluation of Semantic Interpretation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー