Lexicon-Level Contrastive Visual-Grounding Improves Language Modeling

要約

今日の最も正確な言語モデルは、人間の言語学習者が受け取るよりも桁違いに多くの言語データに基づいてトレーニングされていますが、人間の学習において重要な役割を果たす他の感覚様式からの監視はありません。
より生態学的に妥当な監視を使用して、LM の表現と予測をより正確に (そしてより人間らしく) 行うことはできるでしょうか?
この論文では、テキスト表現を改善するために視覚的な監視を活用する、グラウンディングされた言語学習手順である LexiContrastive Grounding (LCG) について説明します。
LexiContrastive Grounding は、語彙情報をエンコードする初期層表現に焦点を当て、次のトークン予測戦略と対照的な視覚的グラウンディング目標を組み合わせます。
LexiContrastive Grounding は、複数の単語学習および文理解ベンチマーク全体で、学習効率において標準的な言語のみのモデルを上回るだけでなく、CLIP、GIT、Flamingo、Vokenization などの視覚と言語の学習手順も改善します。
さらに、LexiContrastive Grounding は、複数言語モデリングタスクの複雑さを約 5% 改善します。
この研究は、視覚的基礎を言語モデルに組み込む可能性を強調し、人間の言語習得の多峰性の性質とより密接に連携します。

要約(オリジナル)

Today’s most accurate language models are trained on orders of magnitude more language data than human language learners receive – but with no supervision from other sensory modalities that play a crucial role in human learning. Can we make LMs’ representations and predictions more accurate (and more human-like) with more ecologically plausible supervision? This paper describes LexiContrastive Grounding (LCG), a grounded language learning procedure that leverages visual supervision to improve textual representations. LexiContrastive Grounding combines a next token prediction strategy with a contrastive visual grounding objective, focusing on early-layer representations that encode lexical information. Across multiple word-learning and sentence-understanding benchmarks, LexiContrastive Grounding not only outperforms standard language-only models in learning efficiency, but also improves upon vision-and-language learning procedures including CLIP, GIT, Flamingo, and Vokenization. Moreover, LexiContrastive Grounding improves perplexity by around 5% on multiple language modeling tasks. This work underscores the potential of incorporating visual grounding into language models, aligning more closely with the multimodal nature of human language acquisition.

arxiv情報

著者	Chengxu Zhuang,Evelina Fedorenko,Jacob Andreas
発行日	2024-03-21 16:52:01+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Lexicon-Level Contrastive Visual-Grounding Improves Language Modeling

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー