Matchmaker: Self-Improving Large Language Model Programs for Schema Matching

要約

スキーママッチング (テーブルや階層が異なる異種データソース間で属性間の一致を見つけるタスク) は、相互運用可能な機械学習 (ML) 対応データを作成するために重要です。
この基本的なデータ中心の問題に対処することは、特にヘルスケア、金融、電子商取引などの分野に広範な影響を及ぼしますが、ML モデルのトレーニングに利用できるデータを増やすことで、より一般的に ML モデルに利益をもたらす可能性もあります。
ただし、異なるスキーマ間の構造的/階層的および意味論的な異質性により、スキーマのマッチングは困難な ML タスクです。
スキーママッチングを自動化するこれまでの ML アプローチでは、モデルトレーニングに大量のラベル付きデータが必要でしたが、多くの場合非現実的であるか、ゼロショットパフォーマンスが低下していました。
この目的を達成するために、候補の生成、改良、信頼スコアリングで構成される、スキーママッチングのための構成言語モデルプログラムである Matchmaker を提案します。
また、Matchmaker は、言語モデルの推論プロセスをガイドするための合成インコンテキストデモンストレーションを構築する新しい最適化アプローチにより、ラベル付きデモンストレーションを必要とせずにゼロショット方式で自己改善します。
私たちは、実際の医療スキーママッチングベンチマークで、Matchmaker が以前の ML ベースのアプローチより優れていることを実証し、データ統合と ML 対応データの相互運用性を加速する可能性を強調しています。

要約(オリジナル)

Schema matching — the task of finding matches between attributes across disparate data sources with different tables and hierarchies — is critical for creating interoperable machine learning (ML)-ready data. Addressing this fundamental data-centric problem has wide implications, especially in domains like healthcare, finance and e-commerce — but also has the potential to benefit ML models more generally, by increasing the data available for ML model training. However, schema matching is a challenging ML task due to structural/hierarchical and semantic heterogeneity between different schemas. Previous ML approaches to automate schema matching have either required significant labeled data for model training, which is often unrealistic or suffer from poor zero-shot performance. To this end, we propose Matchmaker – a compositional language model program for schema matching, comprised of candidate generation, refinement and confidence scoring. Matchmaker also self-improves in a zero-shot manner without the need for labeled demonstrations via a novel optimization approach, which constructs synthetic in-context demonstrations to guide the language model’s reasoning process. Empirically, we demonstrate on real-world medical schema matching benchmarks that Matchmaker outperforms previous ML-based approaches, highlighting its potential to accelerate data integration and interoperability of ML-ready data.

arxiv情報

著者	Nabeel Seedat,Mihaela van der Schaar
発行日	2024-10-31 16:34:03+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Matchmaker: Self-Improving Large Language Model Programs for Schema Matching

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー