Drop your Decoder: Pre-training with Bag-of-Word Prediction for Dense Passage Retrieval

要約

マスクされた自動エンコーダの事前トレーニングは、密な検索システムを初期化および強化するための一般的な手法として登場しました。
通常、追加の Transformer デコーダブロックを利用して、持続可能な監視信号を提供し、コンテキスト情報を高密度の表現に圧縮します。
ただし、このような事前トレーニング手法が有効である根本的な理由は依然として不明です。
追加の Transformer ベースのデコーダを使用すると、膨大な計算コストも発生します。
この研究では、強化されたデコードを使用したマスクオートエンコーダー (MAE) の事前トレーニングにより、バニラ BERT チェックポイントと比較して、高密度表現での入力トークンの用語カバレッジが大幅に向上することを明らかにすることで、この問題に光を当てることを目的としています。
この観察に基づいて、マスクされた自動エンコーダのデコーダを完全に単純化された Bag-of-Word 予測タスクに置き換えることによって、従来の MAE に修正を加えることを提案します。
この変更により、教師なし事前トレーニングを通じて語彙信号を高密度表現に効率的に圧縮できるようになります。
注目すべきことに、私たちが提案した方法は、追加のパラメーターを必要とせずに、いくつかの大規模な検索ベンチマークで最先端の検索パフォーマンスを達成し、強化されたデコードを備えた標準のマスクされた自動エンコーダーの事前トレーニングと比較して 67% のトレーニング速度向上を実現します。
。

要約(オリジナル)

Masked auto-encoder pre-training has emerged as a prevalent technique for initializing and enhancing dense retrieval systems. It generally utilizes additional Transformer decoder blocks to provide sustainable supervision signals and compress contextual information into dense representations. However, the underlying reasons for the effectiveness of such a pre-training technique remain unclear. The usage of additional Transformer-based decoders also incurs significant computational costs. In this study, we aim to shed light on this issue by revealing that masked auto-encoder (MAE) pre-training with enhanced decoding significantly improves the term coverage of input tokens in dense representations, compared to vanilla BERT checkpoints. Building upon this observation, we propose a modification to the traditional MAE by replacing the decoder of a masked auto-encoder with a completely simplified Bag-of-Word prediction task. This modification enables the efficient compression of lexical signals into dense representations through unsupervised pre-training. Remarkably, our proposed method achieves state-of-the-art retrieval performance on several large-scale retrieval benchmarks without requiring any additional parameters, which provides a 67% training speed-up compared to standard masked auto-encoder pre-training with enhanced decoding.

arxiv情報

著者	Guangyuan Ma,Xing Wu,Zijia Lin,Songlin Hu
発行日	2024-04-22 10:44:14+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Drop your Decoder: Pre-training with Bag-of-Word Prediction for Dense Passage Retrieval

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー