Nano: Nested Human-in-the-Loop Reward Learning for Few-shot Language Model Control

要約

事前トレーニングされた言語モデルは、言語生成において並外れた能力を実証しました。
ただし、現実のタスクでは、バイアスを軽減し、公平性を促進し、パーソナライゼーションを実現するために、生成されたテキストの配布を制御する必要があることがよくあります。
生成されたテキストの分布を制御するための既存の技術は、定量化された分布でのみ機能します。定量化された分布には、事前定義されたカテゴリ、分布の割合、または目的の分布に従う既存のコーパスが必要です。
ただし、個人の好みなどの重要な分布の多くは定量化されていません。
この研究では、人間のフィードバックから継続的に学習する少数ショットの人間参加型トレーニングアルゴリズムである Nano を提案することで、任意の分布 (定量化および非定量化) に従ってテキストを生成するという問題に取り組みます。
Nano は、以前の作品と比較して、単一のトピック/属性および定量化された分布制御に関して最先端の結果を達成します。
また、Nano が非定量化分布を学習し、パーソナライゼーションを達成し、高いサンプル効率で異なる個人の個人的な好みの違いを捕捉できることも示します。

要約(オリジナル)

Pretrained language models have demonstrated extraordinary capabilities in language generation. However, real-world tasks often require controlling the distribution of generated text in order to mitigate bias, promote fairness, and achieve personalization. Existing techniques for controlling the distribution of generated text only work with quantified distributions, which require pre-defined categories, proportions of the distribution, or an existing corpus following the desired distributions. However, many important distributions, such as personal preferences, are unquantified. In this work, we tackle the problem of generating text following arbitrary distributions (quantified and unquantified) by proposing Nano, a few-shot human-in-the-loop training algorithm that continuously learns from human feedback. Nano achieves state-of-the-art results on single topic/attribute as well as quantified distribution control compared to previous works. We also show that Nano is able to learn unquantified distributions, achieves personalization, and captures differences between different individuals’ personal preferences with high sample efficiency.

arxiv情報

著者	Xiang Fan,Yiwei Lyu,Paul Pu Liang,Ruslan Salakhutdinov,Louis-Philippe Morency
発行日	2023-07-09 13:40:30+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Nano: Nested Human-in-the-Loop Reward Learning for Few-shot Language Model Control

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー