Eliciting Textual Descriptions from Representations of Continuous Prompts

要約

継続的プロンプト、または「ソフトプロンプト」は、大規模な言語モデルに対して広く採用されているパラメータ効率の高い調整戦略ですが、その不透明な性質のため、多くの場合あまり好ましくありません。
連続プロンプトを解釈する以前の試みは、個々のプロンプトトークンを語彙空間に投影することに依存していました。
ただし、このアプローチには問題があります。パフォーマンスの高いプロンプトでは任意のテキストや矛盾したテキストが生成される可能性があり、プロンプトトークンが個別に解釈されるためです。
この研究では、モデル推論中にその表現からテキストの説明を引き出す、連続的なプロンプトを解釈するための新しいアプローチを提案します。
さまざまなタスクに対して InSPEcT と呼ばれる Patchscopes のバリアント (Ghandeharioun et al., 2024) を使用することで、私たちの方法がしばしば正確なタスクの記述を生成し、タスクのパフォーマンスが向上するにつれてより忠実になることを示します。
さらに、InSPEcT の詳細なバージョンでは、連続プロンプト内の偏った特徴が明らかになり、その存在は偏ったモデル予測と相関します。
効果的な解釈可能性ソリューションを提供する InSPEcT を活用すると、継続的なプロンプトで不要なプロパティをデバッグし、それらを軽減する方法を開発者に通知できます。

要約(オリジナル)

Continuous prompts, or ‘soft prompts’, are a widely-adopted parameter-efficient tuning strategy for large language models, but are often less favorable due to their opaque nature. Prior attempts to interpret continuous prompts relied on projecting individual prompt tokens onto the vocabulary space. However, this approach is problematic as performant prompts can yield arbitrary or contradictory text, and it interprets prompt tokens individually. In this work, we propose a new approach to interpret continuous prompts that elicits textual descriptions from their representations during model inference. Using a Patchscopes variant (Ghandeharioun et al., 2024) called InSPEcT over various tasks, we show our method often yields accurate task descriptions which become more faithful as task performance increases. Moreover, an elaborated version of InSPEcT reveals biased features in continuous prompts, whose presence correlates with biased model predictions. Providing an effective interpretability solution, InSPEcT can be leveraged to debug unwanted properties in continuous prompts and inform developers on ways to mitigate them.

arxiv情報

著者	Dana Ramati,Daniela Gottesman,Mor Geva
発行日	2024-10-15 14:46:11+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Eliciting Textual Descriptions from Representations of Continuous Prompts

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー