Sensitivity and Robustness of Large Language Models to Prompt in Japanese

要約

プロンプトエンジニアリングは、事前トレーニングされた大規模な言語モデルの進歩により、近年重要な関連性を獲得しています。
ただし、この領域内では重大な問題が確認されています。それは、特に日本語などのあまり研究されていない言語において、プロンプトテンプレートに対するこれらのモデルの感度と堅牢性が欠如していることです。
このペーパーでは、いくつかの代表的な大規模言語モデル (LLM) と、広く使用されている事前トレーニング済みモデル (PLM) T5 の包括的な評価を通じて、この問題を調査します。
これらのモデルは、この状況における現在の多言語モデルのパフォーマンスを評価および分析することを目的として、日本語のベンチマークデータセットを使用して精査されています。
私たちの実験結果は、驚くべき矛盾を明らかにしました。
プロンプトテンプレートの文構造を単純に変更しただけで、GPT-4 の精度が 49.21 から 25.44 に大幅に低下しました。
この観察は、高性能の GPT-4 モデルでも、多様な日本語プロンプトテンプレートを扱う場合に重大な安定性の問題に遭遇し、モデルの出力結果の一貫性に疑問が生じるという事実を強調しています。
これらの発見を踏まえて、現在の段階での大規模言語モデルの開発とパフォーマンスをさらに強化するための潜在的な研究の方向性を提案することで結論となります。

要約(オリジナル)

Prompt Engineering has gained significant relevance in recent years, fueled by advancements in pre-trained and large language models. However, a critical issue has been identified within this domain: the lack of sensitivity and robustness of these models towards Prompt Templates, particularly in lesser-studied languages such as Japanese. This paper explores this issue through a comprehensive evaluation of several representative Large Language Models (LLMs) and a widely-utilized pre-trained model(PLM), T5. These models are scrutinized using a benchmark dataset in Japanese, with the aim to assess and analyze the performance of the current multilingual models in this context. Our experimental results reveal startling discrepancies. A simple modification in the sentence structure of the Prompt Template led to a drastic drop in the accuracy of GPT-4 from 49.21 to 25.44. This observation underscores the fact that even the highly performance GPT-4 model encounters significant stability issues when dealing with diverse Japanese prompt templates, rendering the consistency of the model’s output results questionable. In light of these findings, we conclude by proposing potential research trajectories to further enhance the development and performance of Large Language Models in their current stage.

arxiv情報

著者	Chengguang Gan,Tatsunori Mori
発行日	2023-05-15 15:19:08+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Sensitivity and Robustness of Large Language Models to Prompt in Japanese

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー