Benchmarking Large Language Model Capabilities for Conditional Generation

要約

事前トレーニングされた大規模言語モデル (PLM) は、自然言語処理の新しい開発のほとんどの基礎となります。
彼らは、アプリケーション固有のモデルパイプラインから、幅広いタスクに適応する単一モデルに分野を移行しました。
GPT-3 や PaLM などの自己回帰 PLM は、少数ショット学習などの技術と並行して、出力モダリティを分類や回帰ではなく生成にさらに移行しました。
言語モデルは広く使用されているにもかかわらず、言語モデルの導入時にその生成品質が評価されることはほとんどありません。
さらに、既存の生成タスクは、システムを高レベルで比較するために使用できますが、人々がそれらを採用している現実世界のユースケースとどのように関連しているかは不明です。
この研究では、既存のアプリケーション固有の生成ベンチマークを PLM に適応させる方法について説明し、自然言語生成タスクにおける PLM の制限と機能について、スケール、アーキテクチャ、入出力言語などの側面に沿った詳細な実証的研究を提供します。
。
私たちの結果は、PLM はさまざまなデータ体制への適用性と複数の言語への一般化において異なり、特定の生成タスク設定にどの PLM を使用するかを知らせることを示しています。
今後の PLM の開発中に生成機能のベンチマークを行う際に考慮すべきベストプラクティスを共有します。

要約(オリジナル)

Pre-trained large language models (PLMs) underlie most new developments in natural language processing. They have shifted the field from application-specific model pipelines to a single model that is adapted to a wide range of tasks. Autoregressive PLMs like GPT-3 or PaLM, alongside techniques like few-shot learning, have additionally shifted the output modality to generation instead of classification or regression. Despite their ubiquitous use, the generation quality of language models is rarely evaluated when these models are introduced. Additionally, it is unclear how existing generation tasks–while they can be used to compare systems at a high level–relate to the real world use cases for which people have been adopting them. In this work, we discuss how to adapt existing application-specific generation benchmarks to PLMs and provide an in-depth, empirical study of the limitations and capabilities of PLMs in natural language generation tasks along dimensions such as scale, architecture, input and output language. Our results show that PLMs differ in their applicability to different data regimes and their generalization to multiple languages and inform which PLMs to use for a given generation task setup. We share best practices to be taken into consideration when benchmarking generation capabilities during the development of upcoming PLMs.

arxiv情報

著者	Joshua Maynez,Priyanka Agrawal,Sebastian Gehrmann
発行日	2023-06-29 08:59:40+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Benchmarking Large Language Model Capabilities for Conditional Generation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー