Dense Retrieval Adaptation using Target Domain Description

要約

情報検索 (IR) におけるドメイン適応は、ソースドメインとはデータ分布が異なる新しいドメインに検索モデルを適応させるプロセスです。
この分野の既存の方法は、ターゲットドキュメントコレクションにアクセスできる教師なしドメイン適応、またはターゲットドメイン内の（限定された）ラベル付きデータに追加でアクセスできる教師あり（多くの場合少数ショット）ドメイン適応に焦点を当てています。
適応を行わない検索モデルのゼロショット性能を向上させる研究も存在します。
この文書では、まだ調査されていない、IR におけるドメイン適応の新しいカテゴリを紹介します。
ここでは、ゼロショット設定と同様に、検索モデルがターゲットのドキュメントコレクションにアクセスできないと仮定します。
対照的に、ターゲットドメインを説明する簡単なテキスト説明にはアクセスできます。
ターゲットドメインに適用できるソースドメインのさまざまなプロパティを理解するために、取得タスクでドメイン属性の分類を定義します。
テキスト形式のドメイン記述を基に、合成ドキュメントコレクション、クエリセット、疑似関連ラベルを生成する、新しい自動データ構築パイプラインを導入します。
5 つの多様なターゲットドメインでの広範な実験により、構築された合成データを使用して高密度検索モデルを適応させると、ターゲットドメインで効果的な検索パフォーマンスが得られることが示されました。

要約(オリジナル)

In information retrieval (IR), domain adaptation is the process of adapting a retrieval model to a new domain whose data distribution is different from the source domain. Existing methods in this area focus on unsupervised domain adaptation where they have access to the target document collection or supervised (often few-shot) domain adaptation where they additionally have access to (limited) labeled data in the target domain. There also exists research on improving zero-shot performance of retrieval models with no adaptation. This paper introduces a new category of domain adaptation in IR that is as-yet unexplored. Here, similar to the zero-shot setting, we assume the retrieval model does not have access to the target document collection. In contrast, it does have access to a brief textual description that explains the target domain. We define a taxonomy of domain attributes in retrieval tasks to understand different properties of a source domain that can be adapted to a target domain. We introduce a novel automatic data construction pipeline that produces a synthetic document collection, query set, and pseudo relevance labels, given a textual domain description. Extensive experiments on five diverse target domains show that adapting dense retrieval models using the constructed synthetic data leads to effective retrieval performance on the target domain.

arxiv情報

著者	Helia Hashemi,Yong Zhuang,Sachith Sri Ram Kothur,Srivas Prasad,Edgar Meij,W. Bruce Croft
発行日	2023-07-06 02:59:47+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Dense Retrieval Adaptation using Target Domain Description

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー