A System and Benchmark for LLM-based Q\&A on Heterogeneous Data

要約

多くの産業環境では、ユーザーは、スプレッドシート、データベース、API、またはそれらの組み合わせなどの構造化データソースで答えが見つかる可能性のある質問をしたいと考えています。
多くの場合、ユーザーは適切なデータソースを識別したりアクセスしたりする方法がわかりません。
答えを導き出すために複数の (そして潜在的にサイロ化された) データソースを組み立てる必要がある場合、この問題はさらに複雑になります。
最近、大規模言語モデル (LLM) を利用するさまざまな Text-to-SQL アプリケーションは、ユーザーが自然言語で質問できるようにすることで、これらの問題の一部に対処しています。
ただし、これらのアプリケーションは、そのような環境に典型的なデータソースの異質性に対処できないため、現実的な産業環境では依然として実用的ではありません。
このペーパーでは、データベースと API の両方へのシームレスな自然言語アクセスを可能にする siwarex プラットフォームを導入することで異質性に対処します。
siwarex の有効性を実証するために、テーブルの一部をデータ取得 API に置き換えることにより、人気のある Spider データセットとベンチマークを拡張します。
siwarex がデータソースの異質性にうまく対処していることがわかりました。
修正された Spider ベンチマークはまもなく研究コミュニティに利用可能になります

要約(オリジナル)

In many industrial settings, users wish to ask questions whose answers may be found in structured data sources such as a spreadsheets, databases, APIs, or combinations thereof. Often, the user doesn’t know how to identify or access the right data source. This problem is compounded even further if multiple (and potentially siloed) data sources must be assembled to derive the answer. Recently, various Text-to-SQL applications that leverage Large Language Models (LLMs) have addressed some of these problems by enabling users to ask questions in natural language. However, these applications remain impractical in realistic industrial settings because they fail to cope with the data source heterogeneity that typifies such environments. In this paper, we address heterogeneity by introducing the siwarex platform, which enables seamless natural language access to both databases and APIs. To demonstrate the effectiveness of siwarex, we extend the popular Spider dataset and benchmark by replacing some of its tables by data retrieval APIs. We find that siwarex does a good job of coping with data source heterogeneity. Our modified Spider benchmark will soon be available to the research community

arxiv情報

著者	Achille Fokoue,Srideepika Jayaraman,Elham Khabiri,Jeffrey O. Kephart,Yingjie Li,Dhruv Shah,Youssef Drissi,Fenno F. Heath III,Anu Bhamidipaty,Fateh A. Tipu,Robert J. Baseman
発行日	2024-09-09 15:44:39+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

A System and Benchmark for LLM-based Q\&A on Heterogeneous Data

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー