LM-PUB-QUIZ: A Comprehensive Framework for Zero-Shot Evaluation of Relational Knowledge in Language Models

要約

知識探索では、言語モデル (LM) が事前トレーニング段階でリレーショナル知識をどの程度獲得したかを評価します。
これは、さまざまなサイズとトレーニング設定の LM を比較するための費用対効果の高い手段を提供し、継続学習 (CL) 中に得られる知識または失われた知識を監視するのに役立ちます。
これまでの研究では、BEAR (Wiland et al., 2024) と呼ばれる改良された知識プローブを提示しました。これにより、さまざまな事前トレーニング目的 (因果的 LM とマスクされた LM) でトレーニングされた LM の比較が可能になり、以前のプローブの偏った分布の問題に対処できます。
LM の知識をより公平に読み取れるようになります。
この文書では、研究者や実務者が BEAR プローブメカニズムを研究に適用できるようにする、BEAR プローブメカニズムを中心に構築された Python フレームワークおよびリーダーボードである LM-PUB-QUIZ を紹介します。
スタンドアロン評価のオプションと、Hugging Face TRANSFORMERS ライブラリの広く使用されているトレーニングパイプラインへの直接統合のオプションが提供されます。
さらに、さまざまな知識タイプのきめ細かい分析を提供し、評価された各 LM の知識をユーザーがより深く理解できるように支援します。
LM-PUB-QUIZ をオープンソースプロジェクトとして公開します。

要約(オリジナル)

Knowledge probing evaluates the extent to which a language model (LM) has acquired relational knowledge during its pre-training phase. It provides a cost-effective means of comparing LMs of different sizes and training setups and is useful for monitoring knowledge gained or lost during continual learning (CL). In prior work, we presented an improved knowledge probe called BEAR (Wiland et al., 2024), which enables the comparison of LMs trained with different pre-training objectives (causal and masked LMs) and addresses issues of skewed distributions in previous probes to deliver a more unbiased reading of LM knowledge. With this paper, we present LM-PUB- QUIZ, a Python framework and leaderboard built around the BEAR probing mechanism that enables researchers and practitioners to apply it in their work. It provides options for standalone evaluation and direct integration into the widely-used training pipeline of the Hugging Face TRANSFORMERS library. Further, it provides a fine-grained analysis of different knowledge types to assist users in better understanding the knowledge in each evaluated LM. We publicly release LM-PUB-QUIZ as an open-source project.

arxiv情報

著者	Max Ploner,Jacek Wiland,Sebastian Pohl,Alan Akbik
発行日	2024-08-28 11:44:52+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

LM-PUB-QUIZ: A Comprehensive Framework for Zero-Shot Evaluation of Relational Knowledge in Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー