LLMs Are Few-Shot In-Context Low-Resource Language Learners

要約

インコンテキスト学習 (ICL) は、大規模言語モデル (LLM) が、短いコンテキスト内情報のみを使用して、過小評価されている言語でさまざまなタスクを実行できるようにし、高リソース言語と低リソース言語の間のギャップを縮めるための重要な手段を提供します。
それにもかかわらず、低リソース言語について ICL で調査された作品はほんの少数であり、そのほとんどはフランス語やスペイン語などの比較的高リソース言語に焦点を当てています。
この研究では、25 の低リソース言語と 7 つの比較的高リソースの言語に関する ICL とその言語間バリエーション (X-ICL) を広範囲に研究します。
私たちの研究では、低リソース言語で LLM を使用した ICL の有効性を評価するだけでなく、コンテキスト内のラベルアライメントの欠点を特定し、より効果的な代替手段であるクエリアライメントを導入しています。
さらに、低リソース言語の ICL のさまざまな側面に関する貴重な洞察を提供します。
私たちの研究は、ターゲット言語の言語ギャップを埋め、対象となる低リソースと高リソースの間の意味論を調整することにより、意味的に関連した情報を通じてLLMの低リソース理解の質を高める上で、少数ショットのインコンテキスト情報の重要性を結論付けています。
私たちの研究は、特に低リソース言語について、ICL 研究を進めることの重要性を強調しています。

要約(オリジナル)

In-context learning (ICL) empowers large language models (LLMs) to perform diverse tasks in underrepresented languages using only short in-context information, offering a crucial avenue for narrowing the gap between high-resource and low-resource languages. Nonetheless, there is only a handful of works explored ICL for low-resource languages with most of them focusing on relatively high-resource languages, such as French and Spanish. In this work, we extensively study ICL and its cross-lingual variation (X-ICL) on 25 low-resource and 7 relatively higher-resource languages. Our study not only assesses the effectiveness of ICL with LLMs in low-resource languages but also identifies the shortcomings of in-context label alignment, and introduces a more effective alternative: query alignment. Moreover, we provide valuable insights into various facets of ICL for low-resource languages. Our study concludes the significance of few-shot in-context information on enhancing the low-resource understanding quality of LLMs through semantically relevant information by closing the language gap in the target language and aligning the semantics between the targeted low-resource and the high-resource language that the model is proficient in. Our work highlights the importance of advancing ICL research, particularly for low-resource languages.

arxiv情報

著者	Samuel Cahyawijaya,Holy Lovenia,Pascale Fung
発行日	2024-03-25 07:55:29+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

LLMs Are Few-Shot In-Context Low-Resource Language Learners

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー