Pooling And Attention: What Are Effective Designs For LLm-Based Embedding Models?

要約

生成タスクにおける大規模言語モデル (LLM) の大幅な進歩により、LLM ベースの埋め込みモデルを探索する一連の作業が増加しています。
これらのモデルは、さまざまなプーリングおよびアテンション戦略を採用し、公開埋め込みベンチマークで最先端のパフォーマンスを達成していますが、LLM ベースの埋め込みモデルの効果的な設計を構成するものについては依然として疑問が生じています。
ただし、これらのモデルは、異なる LLM ベースモデルまたはトレーニング設定を使用して、異なるデータセットでトレーニングされることがよくあります。
さらに、公開されている埋め込みベンチマークの評価では統計的有意性が報告されないことが多く、どの設計が最終的なパフォーマンスに真に貢献しているかを判断することが困難になります。
これにより、LLM ベースの埋め込みモデルの最適なトレーニングレシピを求める実務者にとってプロセスが複雑になります。
この研究では、同じトレーニングデータとベースモデルを使用して、プーリングとアテンション戦略が異なる一連の LLM ベースの埋め込みモデルをトレーニングすることで大規模な実験を実行します。
結果は、万能の解決策はないことを示しています。双方向の注意と追加のトレーニング可能なプーリング層は、テキストの類似性と情報検索タスクでは優れていますが、EOS 最後のトークンプーリングやデフォルトの因果関係などのより単純な設計を大幅に上回るわけではありません。
クラスタリングと分類のタスクに注意を払う必要があります。
さらに、クロスアテンションネットワークを使用して、最後の層だけでなくすべての隠れ層の出力を変換する、新しいプーリング戦略である多層トレーニング可能プーリングを提案します。
この方法は、既存のプーリング方法と比較して、テキストの類似性と検索タスクにおいて統計的に優れていることが証明されています。
全体として、このホワイトペーパーは、LLM ベースの埋め込みモデルの効果的なトレーニング戦略に光を当てます。

要約(オリジナル)

The significant advancements of Large Language Models (LLMs) in generative tasks have led to a growing body of work exploring LLM-based embedding models. While these models, employing different pooling and attention strategies, have achieved state-of-the-art performance on public embedding benchmarks, questions still arise about what constitutes an effective design for LLM-based embedding models. However, these models are often trained on different datasets, using different LLM base models or training settings. Moreover, evaluations on public embedding benchmarks often fail to report statistical significance, making it difficult to determine which designs truly contribute to final performance. This complicates the process for practitioners seeking optimal training recipes for LLM-based embedding models. In this study, we conduct a large-scale experiment by training a series of LLM-based embedding models using the same training data and base model but differing in their pooling and attention strategies. The results show that there is no one-size-fits-all solution: while bidirectional attention and an additional trainable pooling layer outperform in text similarity and information retrieval tasks, they do not significantly surpass simpler designs like EOS-last token pooling and default causal attention in clustering and classification tasks. Furthermore, we propose a new pooling strategy, Multi-Layers Trainable Pooling, which transforms the outputs of all hidden layers, rather than just the last layer, using a cross-attention network. This method proves to be statistically superior in text similarity and retrieval tasks compared to existing pooling methods. Overall, this paper sheds light on effective training strategies for LLM-based embedding models.

arxiv情報

著者	Yixuan Tang,Yi Yang
発行日	2024-09-04 14:01:48+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Pooling And Attention: What Are Effective Designs For LLm-Based Embedding Models?

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー