I2D2: Inductive Knowledge Distillation with NeuroLogic and Self-Imitation

要約

事前トレーニングされた言語モデルの常識的な機能は規模が大きくなるほど劇的に向上するため、多くの人は規模が唯一の勝利の秘訣であると信じています。
しかし、そうですか？
ここでは、アプリオリに不可能と思われる代替案を調査します。新しい常識的な蒸留アルゴリズムを搭載した場合、より小さな言語モデル (例: GPT-2) が、桁違いに大きく優れたモデル (例: GPT-3) に勝つことができるでしょうか?
重要な知的課題は、規模のメリットに頼ることなく、競争力のあるレベルの常識習得を達成する学習アルゴリズムを設計することです。
特に、私たちは常識知識の生成モデルを研究し、ジェネリックス、つまり鳥は飛べるなどの日常の概念に関する常識的事実の記述を生成するタスクに焦点を当てます。
West らの Symbolic Knowledge Distillation に大まかに従った新しい常識蒸留フレームワークである I2D2 を紹介します。
しかし、2 つの革新により、極端なスケールの教師モデルへの依存を打破します。(1) 弱い既製言語モデルの生成品質を向上させるニューロロジックデコーディングの新たな適応、および (2) 自己模倣学習
モデル自体の強化された常識獲得機能から繰り返し学習します。
経験的な結果は、新しいアルゴリズムが有望な代替手段となり得るため、規模拡大が唯一の方法ではないことを示唆しています。
さらに、私たちの研究は、これまでに入手可能な中で最大かつ最高品質のジェネリック医薬品の新しいコーパス、Gen-A-tomic につながりました。

要約(オリジナル)

Commonsense capabilities of pre-trained language models dramatically improve with scale, leading many to believe that scale is the only winning recipe. But is it? Here, we investigate an alternative that a priori seems impossible: can smaller language models (e.g., GPT-2) win over models that are orders of magnitude larger and better (e.g., GPT-3), if powered with novel commonsense distillation algorithms? The key intellectual challenge is to design a learning algorithm that achieve a competitive level of commonsense acquisition, without relying on the benefits of scale. In particular, we study generative models of commonsense knowledge, focusing on the task of generating generics, statements of commonsense facts about everyday concepts, e.g., birds can fly. We introduce I2D2, a novel commonsense distillation framework that loosely follows the Symbolic Knowledge Distillation of West et al. but breaks the dependence on the extreme-scale teacher model with two innovations: (1) the novel adaptation of NeuroLogic Decoding to enhance the generation quality of the weak, off-the-shelf language models, and (2) self-imitation learning to iteratively learn from the model’s own enhanced commonsense acquisition capabilities. Empirical results suggest that scale is not the only way, as novel algorithms can be a promising alternative. Moreover, our study leads to a new corpus of generics, Gen-A-tomic, that is the largest and highest quality available to date.

arxiv情報

著者	Chandra Bhagavatula,Jena D. Hwang,Doug Downey,Ronan Le Bras,Ximing Lu,Lianhui Qin,Keisuke Sakaguchi,Swabha Swayamdipta,Peter West,Yejin Choi
発行日	2023-05-26 17:14:27+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

I2D2: Inductive Knowledge Distillation with NeuroLogic and Self-Imitation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー