HANS, are you clever? Clever Hans Effect Analysis of Neural Systems

要約

命令調整された大規模言語モデル (It-LLM) は、関係するすべての人々の認知状態、意図、反応を推論する優れた能力を示しており、人間が日々の社会的相互作用を効果的に導き、理解できるようにしています。
実際、モデルの能力の確実な評価を構築するために、いくつかの多肢選択質問 (MCQ) ベンチマークが提案されています。
ただし、初期の研究では、IT-LLM に固有の「順序バイアス」が存在することが示されており、適切な評価に課題が生じています。
このペーパーでは、4 つの MCQ ベンチマークを使用して、一連のプローブテストに対する It-LLM の復元能力を調査します。
敵対的な例を紹介すると、主に選択肢の順序を変更した場合にパフォーマンスに大きな差が生じることがわかり、選択のバイアスが明らかになり、議論の推論能力がもたらされます。
位置バイアスによる最初の位置とモデル選択の間の相関関係に従って、我々は、IT-LLM の意思決定プロセスにおける構造ヒューリスティックの存在を仮説化し、少数ショットのシナリオに重要な例を含めることで強化しました。
最後に、思考連鎖 (CoT) 手法を使用して、モデルの推論を導き出し、より堅牢なモデルを取得することでバイアスを軽減します。

要約(オリジナル)

Instruction-tuned Large Language Models (It-LLMs) have been exhibiting outstanding abilities to reason around cognitive states, intentions, and reactions of all people involved, letting humans guide and comprehend day-to-day social interactions effectively. In fact, several multiple-choice questions (MCQ) benchmarks have been proposed to construct solid assessments of the models’ abilities. However, earlier works are demonstrating the presence of inherent ‘order bias’ in It-LLMs, posing challenges to the appropriate evaluation. In this paper, we investigate It-LLMs’ resilience abilities towards a series of probing tests using four MCQ benchmarks. Introducing adversarial examples, we show a significant performance gap, mainly when varying the order of the choices, which reveals a selection bias and brings into discussion reasoning abilities. Following a correlation between first positions and model choices due to positional bias, we hypothesized the presence of structural heuristics in the decision-making process of the It-LLMs, strengthened by including significant examples in few-shot scenarios. Finally, by using the Chain-of-Thought (CoT) technique, we elicit the model to reason and mitigate the bias by obtaining more robust models.

arxiv情報

著者	Leonardo Ranaldi,Fabio Massimo Zanzotto
発行日	2024-05-02 06:36:26+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

HANS, are you clever? Clever Hans Effect Analysis of Neural Systems

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー