Crystal: Introspective Reasoners Reinforced with Self-Feedback

要約

広範な研究により、常識推論のパフォーマンスと解釈可能性は、推論プロセスを支える知識が明示的に言語化され利用される、知識拡張推論方法によって改善できることが示されています。
しかし、「思考の連鎖」とその変形を含む既存の実装は、常識的推論に必要な知識の内省的な性質を捉えること、および知識の生成と利用の間の相互適応を説明することにおいて不十分である。
我々は、内省的な常識推論ツールである Crystal を開発するための新しい方法を提案します。
常識的な問題に取り組むために、まず与えられた質問に関連する知識ステートメントを内省し、その後、以前に内省した知識に基づいて情報に基づいた予測を行います。
モデルの知識の内省と知識に基づく推論モードは、強化学習によって相互に適応するように調整されており、報酬はモデル自体によって与えられるフィードバックから得られます。
実験によると、Crystal は標準的な教師付き微調整手法と思考連鎖抽出手法の両方を大幅に上回り、常識的推論プロセスの透明性が向上します。
私たちの研究は最終的に、自己フィードバックによるニューラルモデルの強化の実現可能性と可能性を検証します。

要約(オリジナル)

Extensive work has shown that the performance and interpretability of commonsense reasoning can be improved via knowledge-augmented reasoning methods, where the knowledge that underpins the reasoning process is explicitly verbalized and utilized. However, existing implementations, including ‘chain-of-thought’ and its variants, fall short in capturing the introspective nature of knowledge required in commonsense reasoning, and in accounting for the mutual adaptation between the generation and utilization of knowledge. We propose a novel method to develop an introspective commonsense reasoner, Crystal. To tackle commonsense problems, it first introspects for knowledge statements related to the given question, and subsequently makes an informed prediction that is grounded in the previously introspected knowledge. The knowledge introspection and knowledge-grounded reasoning modes of the model are tuned via reinforcement learning to mutually adapt, where the reward derives from the feedback given by the model itself. Experiments show that Crystal significantly outperforms both the standard supervised finetuning and chain-of-thought distilled methods, and enhances the transparency of the commonsense reasoning process. Our work ultimately validates the feasibility and potential of reinforcing a neural model with self-feedback.

arxiv情報

著者	Jiacheng Liu,Ramakanth Pasunuru,Hannaneh Hajishirzi,Yejin Choi,Asli Celikyilmaz
発行日	2023-10-18 14:52:03+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Crystal: Introspective Reasoners Reinforced with Self-Feedback

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー