Mitigating Sycophancy in Decoder-Only Transformer Architectures: Synthetic Data Intervention

要約

大規模な言語モデルにおける人間のフィードバックからの強化学習によって引き起こされるお調子者問題に対処するために、この研究では合成データ介入技術をデコーダ専用のトランスフォーマーアーキテクチャに適用します。
既存の文献における研究のギャップに基づいて、研究者は、多様なデータを生成することでモデルが対応する傾向を減らす実験プロセスを設計し、検証のための実験ツールとして GPT4o を使用しました。
実験では 100 個の正誤質問を使用し、合成データ介入でトレーニングされたモデルと元のトレーニングされていないモデルのパフォーマンスを複数の指標で比較しました。
結果は、SDI トレーニングモデルが正解率およびお調子者率の点でこのテクノロジーをサポートし、お調子者現象の軽減に大きな効果があることを示しています。
特に、データセット、実験プロセス、コード、およびデータ結果は Github にアップロードされており、リンクは https://github.com/brucewang123456789/GeniusTrail.git です。

要約(オリジナル)

To address the sycophancy problem caused by reinforcement learning from human feedback in large language models, this research applies synthetic data intervention technology to the decoder-only transformer architecture. Based on the research gaps in the existing literature, the researcher designed an experimental process to reduce the tendency of models to cater by generating diversified data, and used GPT4o as an experimental tool for verification. The experiment used 100 true and false questions, and compared the performance of the model trained with synthetic data intervention and the original untrained model on multiple indicators. The results show that the SDI training model supports the technology in terms of accuracy rate and sycophancy rate and has significant effectiveness in reducing sycophancy phenomena. Notably, the data set, experimental process, code and data results have been uploaded to Github, the link is https://github.com/brucewang123456789/GeniusTrail.git.

arxiv情報

著者	Libo Wang
発行日	2024-11-20 11:52:09+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Mitigating Sycophancy in Decoder-Only Transformer Architectures: Synthetic Data Intervention

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー