TARGA: Targeted Synthetic Data Generation for Practical Reasoning over Structured Data

要約

自然言語の質問を論理形式に変換するセマンティック解析は、構造化された環境内での推論において重要な役割を果たします。
しかし、既存の手法は 2 つの重大な課題に直面しています。それは、手動で注釈が付けられた大規模なデータセットへの依存と、まだ見ぬ例に対する一般化機能の制限です。
これらの問題に取り組むために、手動のアノテーションなしで関連性の高い合成データを動的に生成する実用的なフレームワークである、ターゲット合成データ生成 (TARGA) を提案します。
特定の質問に関連するエンティティと関係から開始して、レイヤーごとの拡張とレイヤー間の組み合わせを通じて、潜在的に関連するクエリを調査します。
次に、これらの構築されたクエリに対応する自然言語の質問を生成し、コンテキスト内学習の総合デモンストレーションとして共同で機能させます。
複数の知識ベース質問応答 (KBQA) データセットの実験では、7B パラメーターモデルのみを使用する TARGA が、クローズソースモデルを利用する微調整されていない既存の手法を大幅に上回り、GrailQA の F1 スコアで顕著な改善を達成することが実証されました (+7.7
) および KBQA エージェント(+12.2)。
さらに、TARGA は、非 I.I.D 環境下でも優れたサンプル効率、堅牢性、一般化機能も発揮します。
設定。

要約(オリジナル)

Semantic parsing, which converts natural language questions into logic forms, plays a crucial role in reasoning within structured environments. However, existing methods encounter two significant challenges: reliance on extensive manually annotated datasets and limited generalization capability to unseen examples. To tackle these issues, we propose Targeted Synthetic Data Generation (TARGA), a practical framework that dynamically generates high-relevance synthetic data without manual annotation. Starting from the pertinent entities and relations of a given question, we probe for the potential relevant queries through layer-wise expansion and cross-layer combination. Then we generate corresponding natural language questions for these constructed queries to jointly serve as the synthetic demonstrations for in-context learning. Experiments on multiple knowledge base question answering (KBQA) datasets demonstrate that TARGA, using only a 7B-parameter model, substantially outperforms existing non-fine-tuned methods that utilize close-sourced model, achieving notable improvements in F1 scores on GrailQA(+7.7) and KBQA-Agent(+12.2). Furthermore, TARGA also exhibits superior sample efficiency, robustness, and generalization capabilities under non-I.I.D. settings.

arxiv情報

著者	Xiang Huang,Jiayu Shen,Shanshan Huang,Sitao Cheng,Xiaxia Wang,Yuzhong Qu
発行日	2024-12-27 09:16:39+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

TARGA: Targeted Synthetic Data Generation for Practical Reasoning over Structured Data

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー