Underspecification in Language Modeling Tasks: A Causality-Informed Study of Gendered Pronoun Resolution

要約

現代言語モデリングタスクは、仕様が不十分であることがよくあります。特定のトークン予測の場合、多くの単語が推論時に自然言語を生成するというユーザーの意図を満たす可能性がありますが、トレーニング時にタスクの損失関数を最小化できる単語は 1 つだけです。
我々は、誤った相関の生成においてアンダースペックが果たす役割を説明する、単純だがもっともらしい因果関係のメカニズムを提供します。
その単純さにも関わらず、私たちの因果モデルは 2 つの軽量のブラックボックス評価手法の開発に直接情報を提供します。これらの手法は、幅広い LLM の性別代名詞解決タスクに適用され、1) 2) を利用して推論時のタスクの過小仕様の検出を支援します。
) これまで報告されていなかった、性別対時間および性別対場所のさまざまな LLM 上の偽の相関 A) サイズ: BERT ベースから GPT 3.5、B) 事前トレーニングの目的: マスクされた言語モデリングおよび自己回帰言語モデリングからこれらの混合物まで
C) トレーニング段階: 事前トレーニングのみからヒューマンフィードバックからの強化学習 (RLHF) まで。
コードとオープンソースのデモは https://github.com/2dot71mily/sib_paper で入手できます。

要約(オリジナル)

Modern language modeling tasks are often underspecified: for a given token prediction, many words may satisfy the user’s intent of producing natural language at inference time, however only one word would minimize the task’s loss function at training time. We provide a simple yet plausible causal mechanism describing the role underspecification plays in the generation of spurious correlations. Despite its simplicity, our causal model directly informs the development of two lightweight black-box evaluation methods, that we apply to gendered pronoun resolution tasks on a wide range of LLMs to 1) aid in the detection of inference-time task underspecification by exploiting 2) previously unreported gender vs. time and gender vs. location spurious correlations on LLMs with a range of A) sizes: from BERT-base to GPT 3.5, B) pre-training objectives: from masked & autoregressive language modeling to a mixture of these objectives, and C) training stages: from pre-training only to reinforcement learning from human feedback (RLHF). Code and open-source demos available at https: //github.com/2dot71mily/sib_paper.

arxiv情報

著者	Emily McMilin
発行日	2023-07-17 17:56:10+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Underspecification in Language Modeling Tasks: A Causality-Informed Study of Gendered Pronoun Resolution

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー