EHR-SeqSQL : A Sequential Text-to-SQL Dataset For Interactively Exploring Electronic Health Records

要約

このペーパーでは、電子医療記録 (EHR) データベース用の新しいテキストから SQL へのシーケンシャルデータセットである EHR-SeqSQL を紹介します。
EHR-SeqSQL は、対話性、構成性、効率など、テキストから SQL への解析における重要かつ十分に研究されていない側面に対処するように設計されています。
私たちの知る限り、EHR-SeqSQL は最大であるだけでなく、逐次的かつ状況に応じた質問を含む最初の医療テキストから SQL へのデータセットベンチマークでもあります。
データ分割と、構成的汎化能力を評価するために設計された新しいテストセットを提供します。
私たちの実験では、構成性の学習において、シングルターンアプローチよりもマルチターンアプローチの方が優れていることを示しています。
さらに、当社のデータセットは特別に作成されたトークンを SQL クエリに統合し、実行効率を向上させます。
EHR-SeqSQL を使用して、テキストから SQL への領域における実際のニーズと学術研究の間のギャップを埋めることを目指しています。
EHR-SeqSQL は https://github.com/seonhee99/EHR-SeqSQL で入手できます。

要約(オリジナル)

In this paper, we introduce EHR-SeqSQL, a novel sequential text-to-SQL dataset for Electronic Health Record (EHR) databases. EHR-SeqSQL is designed to address critical yet underexplored aspects in text-to-SQL parsing: interactivity, compositionality, and efficiency. To the best of our knowledge, EHR-SeqSQL is not only the largest but also the first medical text-to-SQL dataset benchmark to include sequential and contextual questions. We provide a data split and the new test set designed to assess compositional generalization ability. Our experiments demonstrate the superiority of a multi-turn approach over a single-turn approach in learning compositionality. Additionally, our dataset integrates specially crafted tokens into SQL queries to improve execution efficiency. With EHR-SeqSQL, we aim to bridge the gap between practical needs and academic research in the text-to-SQL domain. EHR-SeqSQL is available at https://github.com/seonhee99/EHR-SeqSQL.

arxiv情報

著者	Jaehee Ryu,Seonhee Cho,Gyubok Lee,Edward Choi
発行日	2024-07-30 10:09:13+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

EHR-SeqSQL : A Sequential Text-to-SQL Dataset For Interactively Exploring Electronic Health Records

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー