Towards Universal Dense Blocking for Entity Resolution

要約

ブロッキングはエンティティ解決における重要なステップであり、ニューラルネットワークベースの表現モデルの出現により、ブロッキングにおける深いセマンティクスを探求するための有望なアプローチとしてデンスブロッキングが開発されました。
ただし、以前の高度な自己監視型デンスブロッキングアプローチでは、ターゲットドメインでのドメイン固有のトレーニングが必要であり、これらの方法の利点と迅速な適応が制限されます。
この問題に対処するために、自己教師あり対比学習を使用してドメインに依存せず、簡単に取得できる表形式のコーパスで事前トレーニングされた高密度ブロッカーである UniBlocker を提案します。
ドメインに依存しない事前トレーニングを実施することで、UniBlocker はドメイン固有の微調整を必要とせずに、さまざまなダウンストリームブロッキングシナリオに適応できます。
エンティティブロッカーの普遍性を評価するために、複数のドメインとシナリオからの幅広いブロックタスクをカバーする新しいベンチマークも構築しました。
私たちの実験では、提案された UniBlocker が、ドメイン固有の学習を行わなくても、以前の自己および教師なしのデンスブロッキング手法を大幅に上回り、最先端のスパースブロッキング手法に匹敵し、補完的であることが示されました。

要約(オリジナル)

Blocking is a critical step in entity resolution, and the emergence of neural network-based representation models has led to the development of dense blocking as a promising approach for exploring deep semantics in blocking. However, previous advanced self-supervised dense blocking approaches require domain-specific training on the target domain, which limits the benefits and rapid adaptation of these methods. To address this issue, we propose UniBlocker, a dense blocker that is pre-trained on a domain-independent, easily-obtainable tabular corpus using self-supervised contrastive learning. By conducting domain-independent pre-training, UniBlocker can be adapted to various downstream blocking scenarios without requiring domain-specific fine-tuning. To evaluate the universality of our entity blocker, we also construct a new benchmark covering a wide range of blocking tasks from multiple domains and scenarios. Our experiments show that the proposed UniBlocker, without any domain-specific learning, significantly outperforms previous self- and unsupervised dense blocking methods and is comparable and complementary to the state-of-the-art sparse blocking methods.

arxiv情報

著者	Tianshu Wang,Hongyu Lin,Xianpei Han,Xiaoyang Chen,Boxi Cao,Le Sun
発行日	2024-04-25 06:37:51+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Towards Universal Dense Blocking for Entity Resolution

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー