Shallow Cross-Encoders for Low-Latency Retrieval

要約

トランスフォーマーベースのクロスエンコーダーは、テキスト検索において最先端の効率を実現します。
ただし、大規模なトランスフォーマーモデル (BERT や T5 など) に基づくクロスエンコーダーは計算コストが高く、適度に小さいレイテンシーウィンドウ内で少数のドキュメントのみをスコアリングできます。
ただし、検索遅延を低く抑えることは、ユーザーの満足度とエネルギー使用量にとって重要です。
この論文では、より弱い浅いトランスフォーマーモデル (つまり、層数が制限されたトランスフォーマー) は、これらの実用的な低レイテンシ設定に制約された場合、フルスケールモデルより実際に優れたパフォーマンスを発揮することを示します。
同時に予算も。
さらに、浅い変換器は、最近推奨タスクでの成功を実証した一般化バイナリクロスエントロピー (gBCE) トレーニングスキームから恩恵を受ける可能性があることを示します。
TREC ディープラーニングパッセージランキングクエリセットを使用した実験では、低遅延シナリオにおける浅いモデルとフルスケールモデルの大幅な改善が実証されました。
たとえば、レイテンシー制限がクエリあたり 25 ミリ秒の場合、MonoBERT-Large (フルスケール BERT モデルに基づくクロスエンコーダー) は TREC DL 2019 で 0.431 の NDCG@10 しか達成できませんが、TinyBERT-gBCE (
gBCE でトレーニングされた TinyBERT に基づくクロスエンコーダー) は、NDCG@10 の 0.652 に達し、MonoBERT-Large よりも +51% のゲインが得られます。
また、浅いクロスエンコーダーは GPU なしで使用した場合でも効果的であることも示します (たとえば、CPU 推論では、レイテンシー 50 ミリ秒の GPU 推論と比較して、NDCG@10 は 3% しか減少しません)。これにより、クロスエンコーダーは、GPU なしでも実行するのが実用的になります。
特殊なハードウェアアクセラレーション。

要約(オリジナル)

Transformer-based Cross-Encoders achieve state-of-the-art effectiveness in text retrieval. However, Cross-Encoders based on large transformer models (such as BERT or T5) are computationally expensive and allow for scoring only a small number of documents within a reasonably small latency window. However, keeping search latencies low is important for user satisfaction and energy usage. In this paper, we show that weaker shallow transformer models (i.e., transformers with a limited number of layers) actually perform better than full-scale models when constrained to these practical low-latency settings since they can estimate the relevance of more documents in the same time budget. We further show that shallow transformers may benefit from the generalized Binary Cross-Entropy (gBCE) training scheme, which has recently demonstrated success for recommendation tasks. Our experiments with TREC Deep Learning passage ranking query sets demonstrate significant improvements in shallow and full-scale models in low-latency scenarios. For example, when the latency limit is 25ms per query, MonoBERT-Large (a cross-encoder based on a full-scale BERT model) is only able to achieve NDCG@10 of 0.431 on TREC DL 2019, while TinyBERT-gBCE (a cross-encoder based on TinyBERT trained with gBCE) reaches NDCG@10 of 0.652, a +51% gain over MonoBERT-Large. We also show that shallow Cross-Encoders are effective even when used without a GPU (e.g., with CPU inference, NDCG@10 decreases only by 3% compared to GPU inference with 50ms latency), which makes Cross-Encoders practical to run even without specialized hardware acceleration.

arxiv情報

著者	Aleksandr V. Petrov,Sean MacAvaney,Craig Macdonald
発行日	2024-03-29 15:07:21+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Shallow Cross-Encoders for Low-Latency Retrieval

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー