Re-evaluating the Need for Multimodal Signals in Unsupervised Grammar Induction

要約

文法誘導にはマルチモーダルな入力が必要ですか?
最近の研究では、マルチモーダルなトレーニング入力により文法誘導が改善できることが示されています。
ただし、これらの改善は、比較的少量のテキストデータでトレーニングされた弱いテキストのみのベースラインとの比較に基づいています。
大量のテキストトレーニングデータを含むレジームでマルチモーダル入力が必要かどうかを判断するために、LC-PCFG と呼ばれる、より強力なテキストのみのベースラインを設計します。
LC-PCFG は、テキストのみの大規模言語モデル (LLM) からの em-bedding を組み込んだ C-PFCG です。
固定文法ファミリーを使用して、LC-PCFG をさまざまなマルチモーダル文法帰納法と直接比較します。
4 つのベンチマークデータセットでパフォーマンスを比較します。
LC-PCFG は、最先端のマルチモーダル文法誘導手法と比較して、Corpus-F1 で最大 17% の相対的な改善を実現します。
LC-PCFG は計算効率も高く、マルチモーダルアプローチと比較してパラメーター数が最大 85% 削減され、トレーニング時間が 8.8 倍削減されます。
これらの結果は、文法誘導にはマルチモーダル入力は必要ない可能性があることを示唆しており、マルチモーダルアプローチの利点を評価するための強力なビジョンフリーベースラインの重要性を強調しています。

要約(オリジナル)

Are multimodal inputs necessary for grammar induction? Recent work has shown that multimodal training inputs can improve grammar induction. However, these improvements are based on comparisons to weak text-only baselines that were trained on relatively little textual data. To determine whether multimodal inputs are needed in regimes with large amounts of textual training data, we design a stronger text-only baseline, which we refer to as LC-PCFG. LC-PCFG is a C-PFCG that incorporates em-beddings from text-only large language models (LLMs). We use a fixed grammar family to directly compare LC-PCFG to various multi-modal grammar induction methods. We compare performance on four benchmark datasets. LC-PCFG provides an up to 17% relative improvement in Corpus-F1 compared to state-of-the-art multimodal grammar induction methods. LC-PCFG is also more computationally efficient, providing an up to 85% reduction in parameter count and 8.8x reduction in training time compared to multimodal approaches. These results suggest that multimodal inputs may not be necessary for grammar induction, and emphasize the importance of strong vision-free baselines for evaluating the benefit of multimodal approaches.

arxiv情報

著者	Boyi Li,Rodolfo Corona,Karttikeya Mangalam,Catherine Chen,Daniel Flaherty,Serge Belongie,Kilian Q. Weinberger,Jitendra Malik,Trevor Darrell,Dan Klein
発行日	2024-04-12 14:53:30+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Re-evaluating the Need for Multimodal Signals in Unsupervised Grammar Induction

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー