Pairwise Judgment Formulation for Semantic Embedding Model in Web Search

要約

ニューラルネットワークベースのシャムアーキテクチャであるセマンティックエンベディングモデル (SEM) は、情報検索と自然言語処理において勢いを増しています。
Web 検索用に教師あり形式で SEM をトレーニングするには、通常、検索エンジンのクエリログを利用して、ペアごとの判断をトレーニングデータとして自動的に定式化します。
検索エンジン業界ではセマンティック埋め込みの応用が増えているにもかかわらず、SEM をトレーニングするための効果的なペアごとの判断を定式化する取り組みはほとんど行われていません。
この論文では、SEM のペアワイズ判断を生成するための幅広い戦略を初めて詳細に調査します。
興味深い（おそらく驚くべき）発見により、ペアごとの学習からランク付け（LTR）の分野で広く使用されている従来のペアごとの判断定式化戦略が、SEM のトレーニングには必ずしも効果的ではないことが明らかになりました。
主要な商用検索エンジンからのクエリログとクリックスルーアクティビティに基づく大規模な実証研究を通じて、SEM の効果的な戦略を実証し、SEM と比較したハイブリッドヒューリスティック (つまり、クリック > 非クリック) の利点を強調します。
LTR のアトミックヒューリスティック (例: クリック > スキップ)。
最後に、SEM をトレーニングするためのベストプラクティスを紹介し、将来の研究に役立つ洞察を提供します。

要約(オリジナル)

Semantic Embedding Model (SEM), a neural network-based Siamese architecture, is gaining momentum in information retrieval and natural language processing. In order to train SEM in a supervised fashion for Web search, the search engine query log is typically utilized to automatically formulate pairwise judgments as training data. Despite the growing application of semantic embeddings in the search engine industry, little work has been done on formulating effective pairwise judgments for training SEM. In this paper, we make the first in-depth investigation of a wide range of strategies for generating pairwise judgments for SEM. An interesting (perhaps surprising) discovery reveals that the conventional pairwise judgment formulation strategy wildly used in the field of pairwise Learning-to-Rank (LTR) is not necessarily effective for training SEM. Through a large-scale empirical study based on query logs and click-through activities from a major commercial search engine, we demonstrate the effective strategies for SEM and highlight the advantages of a hybrid heuristic (i.e., Clicked > Non-Clicked) in comparison to the atomic heuristics (e.g., Clicked > Skipped) in LTR. We conclude with best practices for training SEM and offer promising insights for future research.

arxiv情報

著者	Mengze Hong,Wailing Ng,Zichang Guo,Chen Jason Zhang
発行日	2024-11-21 16:43:22+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Pairwise Judgment Formulation for Semantic Embedding Model in Web Search

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー