Rule-Guided Feedback: Enhancing Reasoning by Enforcing Rule Adherence in Large Language Models

要約

このホワイトペーパーでは、構造化されたルールの順守と戦略的情報探索を通じて、大規模な言語モデル（LLM）パフォーマンスを強化するために設計されたフレームワークであるルールガイド付きフィードバック（RGF）を紹介します。
RGFは、確立されたガイドラインを通じてルールフォローが強制される教師と学生のパラダイムを実装しています。
私たちのフレームワークは、各生徒の出力をタスク固有のルールに対して厳密に評価する教師モデルを採用しており、逸脱を検出するときに直接回答ではなく、建設的なガイダンスを提供します。
この反復フィードバックループは、定義された制約内でソリューションを維持することと、不確実性を解決しようとする積極的な情報を奨励するという2つの重要な目的を果たします。
Checkmate-in-Oneパズル、ソネットの執筆、テーブルのペンギン分類、GSM8K、StrategyQAなどの多様なタスクでRGFを評価します。
私たちの調査結果は、構造化されたフィードバックメカニズムがさまざまなドメインでLLMSのパフォーマンスを大幅に向上させることができることを示唆しています。

要約(オリジナル)

In this paper, we introduce Rule-Guided Feedback (RGF), a framework designed to enhance Large Language Model (LLM) performance through structured rule adherence and strategic information seeking. RGF implements a teacher-student paradigm where rule-following is forced through established guidelines. Our framework employs a Teacher model that rigorously evaluates each student output against task-specific rules, providing constructive guidance rather than direct answers when detecting deviations. This iterative feedback loop serves two crucial purposes: maintaining solutions within defined constraints and encouraging proactive information seeking to resolve uncertainties. We evaluate RGF on diverse tasks including Checkmate-in-One puzzles, Sonnet Writing, Penguins-In-a-Table classification, GSM8k, and StrategyQA. Our findings suggest that structured feedback mechanisms can significantly enhance LLMs’ performance across various domains.

arxiv情報

著者	Aissatou Diallo,Antonis Bikakis,Luke Dickens,Anthony Hunter,Rob Miller
発行日	2025-03-14 12:05:06+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Rule-Guided Feedback: Enhancing Reasoning by Enforcing Rule Adherence in Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー