Predicting vs. Acting: A Trade-off Between World Modeling & Agent Modeling

要約

RLHF に対応した LM は、ベンチマークと長文テキスト生成の両方で前例のない能力を示していますが、次のトークンの予測という 1 つの基本的なタスクで苦労しています。
RLHF モデルが人間との対話を目的としたエージェントモデルになるにつれて、RLHF モデルはワールドモデリング、つまり、RLHF が適応する Base LM の基本的なトレーニング目標である、任意の文書の次に何が来るかを予測する能力を失っているように見えます。
このトレードオフを経験的に示すことに加えて、私たちは潜在的な説明を提案します。コヒーレントな長い形式の生成を実行するために、RLHF モデルは暗黙的なブループリントを介してランダム性を制限します。
特に、RLHF モデルは、同じプロンプトに対して複数の世代にわたって同時に発生するアンカースパンのセットに確率を集中させ、テキストの足場として機能しますが、これらのスパンを含まないドキュメントを生成するモデルの機能も制限します。
私たちは、このトレードオフを、RLHF と連携した現在の最も効果的なエージェントモデルで研究し、連携技術が向上したにもかかわらず、これが機能するモデルと予測するモデルの間の基本的なトレードオフであり続ける理由を探ります。

要約(オリジナル)

RLHF-aligned LMs have shown unprecedented ability on both benchmarks and long-form text generation, yet they struggle with one foundational task: next-token prediction. As RLHF models become agent models aimed at interacting with humans, they seem to lose their world modeling — the ability to predict what comes next in arbitrary documents, which is the foundational training objective of the Base LMs that RLHF adapts. Besides empirically demonstrating this trade-off, we propose a potential explanation: to perform coherent long-form generation, RLHF models restrict randomness via implicit blueprints. In particular, RLHF models concentrate probability on sets of anchor spans that co-occur across multiple generations for the same prompt, serving as textual scaffolding but also limiting a model’s ability to generate documents that do not include these spans. We study this trade-off on the most effective current agent models, those aligned with RLHF, while exploring why this may remain a fundamental trade-off between models that act and those that predict, even as alignment techniques improve.

arxiv情報

著者	Margaret Li,Weijia Shi,Artidoro Pagnoni,Peter West,Ari Holtzman
発行日	2024-07-02 17:22:54+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Predicting vs. Acting: A Trade-off Between World Modeling & Agent Modeling

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー