Focus on Your Question! Interpreting and Mitigating Toxic CoT Problems in Commonsense Reasoning

要約

大規模な言語モデルは、特に思考連鎖 (CoT) のような拡張手法を使用すると、高度な常識的推論能力を発揮します。
しかし、これらの CoT のような手法では、元々は正しかったかなりの数の答えが間違ったものになることがわかり、これを有害な CoT 問題と定義します。
この問題を解釈して軽減するために、まず属性追跡および因果追跡手法を利用して、CoT 推論中の LLM の内部動作メカニズムを調査します。
比較を通じて、モデルが根拠や回答を生成する際に、浅い注意層にわたる質問からの情報損失を示していることを証明します。
調査結果に基づいて、RIDERS (Residual decodeing and serial-position Swap) と呼ばれる新しい方法を設計します。これは、デコードとシリアル位置の両方の観点からモデル内の情報不足を補います。
複数の常識推論ベンチマークに対する広範な実験を通じて、この方法が Toxic CoT 問題を大幅に排除する (23.6% 減少) だけでなく、モデル全体の常識推論パフォーマンスを効果的に向上させる (5.5% 増加) ことを検証しました。

要約(オリジナル)

Large language models exhibit high-level commonsense reasoning abilities, especially with enhancement methods like Chain-of-Thought (CoT). However, we find these CoT-like methods lead to a considerable number of originally correct answers turning wrong, which we define as the Toxic CoT problem. To interpret and mitigate this problem, we first utilize attribution tracing and causal tracing methods to probe the internal working mechanism of the LLM during CoT reasoning. Through comparisons, we prove that the model exhibits information loss from the question over the shallow attention layers when generating rationales or answers. Based on the probing findings, we design a novel method called RIDERS (Residual decodIng and sERial-position Swap), which compensates for the information deficit in the model from both decoding and serial-position perspectives. Through extensive experiments on multiple commonsense reasoning benchmarks, we validate that this method not only significantly eliminates Toxic CoT problems (decreased by 23.6%), but also effectively improves the model’s overall commonsense reasoning performance (increased by 5.5%).

arxiv情報

著者	Jiachun Li,Pengfei Cao,Chenhao Wang,Zhuoran Jin,Yubo Chen,Daojian Zeng,Kang Liu,Jun Zhao
発行日	2024-02-28 14:09:02+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Focus on Your Question! Interpreting and Mitigating Toxic CoT Problems in Commonsense Reasoning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー