Enhancing LLM Robustness to Perturbed Instructions: An Empirical Study

要約

大規模言語モデル（LLM）は入力の摂動に対して非常に脆弱である。LLMのロバスト性を向上させる既存の手法は、主に摂動データサンプルに焦点を当てているが、タスクレベルの命令の摂動に対する回復力を向上させることは、比較的未解明である。本研究では、下流の性能を大幅に低下させる、タスク固有の命令の文字レベルおよび単語レベルの編集に焦点を当てる。LLMの頑健性を向上させるために、自己ノイズ除去や表現アライメントなど様々な手法を用い、様々なモデル（Llama 3、Flan-T5）、データセット（CoLA、QNLI、SST-2）、命令（タスク指向とロール指向の両方）をテストした。我々は、凍結されたLLMであろうと、微調整されたモデルであろうと、平均して、セルフデノイジングは、アンサンブルや教師あり手法のような、より複雑なベースラインを含む代替戦略よりも、大幅に高い性能向上を達成することを発見した。

要約(オリジナル)

Large Language Models (LLMs) are highly vulnerable to input perturbations, as even a small prompt change may result in a substantially different output. Existing methods to enhance LLM robustness are primarily focused on perturbed data samples, whereas improving resiliency to perturbations of task-level instructions has remained relatively underexplored. In this work, we focus on character- and word-level edits of task-specific instructions, which substantially degrade downstream performance. We experiment with a variety of techniques to enhance the robustness of LLMs, including self-denoising and representation alignment, testing different models (Llama 3 and Flan-T5), datasets (CoLA, QNLI, SST-2) and instructions (both task-oriented and role-oriented). We find that, on average, self-denoising — whether performed by a frozen LLM or a fine-tuned model — achieves substantially higher performance gains than alternative strategies, including more complex baselines such as ensembling and supervised methods.

arxiv情報

著者	Aryan Agrawal,Lisa Alazraki,Shahin Honarvar,Marek Rei
発行日	2025-04-03 16:17:56+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Enhancing LLM Robustness to Perturbed Instructions: An Empirical Study

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー