Testing AI performance on less frequent aspects of language reveals insensitivity to underlying meaning

要約

計算方法の進歩とビッグデータの可用性は、最近、AI アプリケーションのブレークスルーにつながりました。
ボトムアップの課題の成功が欠点を部分的に覆い隠しているため、Large Language Models の「人間のような」パフォーマンスは、言語パフォーマンスがアルゴリズムによってどのように達成されるかという問題を提起しました。
多くの AI システムにわたる一般化の体系的な欠点を考慮して、この作業では、言語パフォーマンスが大規模言語モデルの言語知識によって実際に導かれているかどうかを尋ねます。
この目的のために、GPT-3 に文法性判断タスクと、大規模言語モデルのトレーニングデータの一部を形成する可能性が低い、頻度の低い構文に関する理解度の質問を促します。
これらには、文法的な「錯覚」、意味の異常、複雑なネストされた階層、および自己埋め込みが含まれていました。
GPT-3 は、1 つを除くすべてのプロンプトで失敗し、頻繁に使用される頻度の低い文法構造で使用される頻度の高い単語でさえ、重大な理解不足を示す回答を提供することがよくありました。
現在の研究は、疑惑の AI の人間のような言語能力の境界に光を当て、人間のようなものとはかけ離れており、LLM の次の単語の予測能力は、トレーニングデータを超えてプッシュされると、堅牢性の問題に直面する可能性があると主張しています。

要約(オリジナル)

Advances in computational methods and big data availability have recently translated into breakthroughs in AI applications. With successes in bottom-up challenges partially overshadowing shortcomings, the ‘human-like’ performance of Large Language Models has raised the question of how linguistic performance is achieved by algorithms. Given systematic shortcomings in generalization across many AI systems, in this work we ask whether linguistic performance is indeed guided by language knowledge in Large Language Models. To this end, we prompt GPT-3 with a grammaticality judgement task and comprehension questions on less frequent constructions that are thus unlikely to form part of Large Language Models’ training data. These included grammatical ‘illusions’, semantic anomalies, complex nested hierarchies and self-embeddings. GPT-3 failed for every prompt but one, often offering answers that show a critical lack of understanding even of high-frequency words used in these less frequent grammatical constructions. The present work sheds light on the boundaries of the alleged AI human-like linguistic competence and argues that, far from human-like, the next-word prediction abilities of LLMs may face issues of robustness, when pushed beyond training data.

arxiv情報

著者	Vittoria Dentella,Elliot Murphy,Gary Marcus,Evelina Leivada
発行日	2023-02-23 20:18:52+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Testing AI performance on less frequent aspects of language reveals insensitivity to underlying meaning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー