Revision Transformers: Instructing Language Models to Change their Values

要約

現在のトランスフォーマー言語モデル (LM) は、数十億のパラメーターを持つ大規模なモデルです。
彼らはさまざまなタスクで高いパフォーマンスを発揮することが示されていますが、近道学習や偏見が生じやすい傾向もあります。
パラメーター調整によってこのような誤ったモデルの動作に対処するには、非常にコストがかかります。
これは、文化的または個人的に変化する道徳的価値観などの動的な概念を更新する場合に特に問題となります。
この研究では、すべての情報をモデルパラメーターに保存する現在の一般的な慣行に疑問を呈し、モデルの更新を容易にするリビジョントランスフォーマー (RiT) を提案します。
世界の知識を本質的かつ拡散的にエンコードする大規模な事前トレーニング済み LM と、明確に構造化されたリビジョンエンジンとの特別な組み合わせにより、ほとんど労力とユーザーインタラクションの助けを借りてモデルの知識を更新することが可能になります。
私たちは道徳データセットで RiT を例示し、ユーザーのフィードバックをシミュレートして、データが小さい場合でもモデル修正で優れたパフォーマンスを示します。
このようにして、ユーザーは自分の好みに合わせてモデルを簡単に設計でき、より透明性の高い AI モデルへの道が開かれます。

要約(オリジナル)

Current transformer language models (LM) are large-scale models with billions of parameters. They have been shown to provide high performances on a variety of tasks but are also prone to shortcut learning and bias. Addressing such incorrect model behavior via parameter adjustments is very costly. This is particularly problematic for updating dynamic concepts, such as moral values, which vary culturally or interpersonally. In this work, we question the current common practice of storing all information in the model parameters and propose the Revision Transformer (RiT) to facilitate easy model updating. The specific combination of a large-scale pre-trained LM that inherently but also diffusely encodes world knowledge with a clear-structured revision engine makes it possible to update the model’s knowledge with little effort and the help of user interaction. We exemplify RiT on a moral dataset and simulate user feedback demonstrating strong performance in model revision even with small data. This way, users can easily design a model regarding their preferences, paving the way for more transparent AI models.

arxiv情報

著者	Felix Friedrich,Wolfgang Stammer,Patrick Schramowski,Kristian Kersting
発行日	2023-07-25 13:02:49+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Revision Transformers: Instructing Language Models to Change their Values

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー