RELIC: Evaluating Compositional Instruction Following via Language Recognition

要約

大規模な言語モデル（LLM）は、入力や出力の例なしで、コンテキストで提供されるタスクの仕様のみに基づいてタスクを実行することがますます期待されています。
この能力は、次の指示と呼ばれます。
言語認識を使用した後に命令を評価するための言語内の文字（RELIC）フレームワークの認識を紹介します。文字列が正式な文法によって生成されるかどうかを判断するタスク。
LLMSのコンテキストを使用する能力の多くの標準的な評価とは異なり、このタスクでは、コンテキストから取得された多数の指示（文法制作）を一緒に作成する必要があります。
言語は合成であるため、LLMSのスキルが向上するにつれてタスクを複雑にすることができ、新しいインスタンスを自動的に生成して、データの汚染を軽減できます。
正式なLLMSをRelicで評価し、文法と個々の例の文字列の複雑さからその精度を確実に予測できること、そして現在利用可能な最も先進的なLLMでさえ、理論的な期待に沿って、より複雑な文法とサンプルでほぼチャンスのパフォーマンスを示していることがわかります。
また、LLMSがますます困難な推論タスクを解決しようとする方法を診断するためにRelicを使用して、言語認識タスクの複雑さが増加するにつれて、モデルは複雑な指示に従うのではなく、浅いヒューリスティックに依存するように切り替えることがわかります。

要約(オリジナル)

Large language models (LLMs) are increasingly expected to perform tasks based only on a specification of the task provided in context, without examples of inputs and outputs; this ability is referred to as instruction following. We introduce the Recognition of Languages In-Context (RELIC) framework to evaluate instruction following using language recognition: the task of determining if a string is generated by formal grammar. Unlike many standard evaluations of LLMs’ ability to use their context, this task requires composing together a large number of instructions (grammar productions) retrieved from the context. Because the languages are synthetic, the task can be increased in complexity as LLMs’ skills improve, and new instances can be automatically generated, mitigating data contamination. We evaluate state-of-the-art LLMs on RELIC and find that their accuracy can be reliably predicted from the complexity of the grammar and the individual example strings, and that even the most advanced LLMs currently available show near-chance performance on more complex grammars and samples, in line with theoretical expectations. We also use RELIC to diagnose how LLMs attempt to solve increasingly difficult reasoning tasks, finding that as the complexity of the language recognition task increases, models switch to relying on shallow heuristics instead of following complex instructions.

arxiv情報

著者	Jackson Petty,Michael Y. Hu,Wentao Wang,Shauli Ravfogel,William Merrill,Tal Linzen
発行日	2025-06-05 16:17:24+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

RELIC: Evaluating Compositional Instruction Following via Language Recognition

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー