Graph-based Retrieval Augmented Generation for Dynamic Few-shot Text Classification

要約

テキスト分類は自然言語処理の基本的なタスクであり、クエリの最適化、データ統合、スキーママッチングなどのさまざまなアプリケーションにとって極めて重要です。
CNN や BERT などのニューラルネットワークベースのモデルはテキスト分類において顕著なパフォーマンスを示していますが、その有効性は豊富なラベル付きトレーニングデータに大きく依存しています。
この依存関係により、ラベル付きデータが不足し、アプリケーションのニーズに基づいてターゲットラベルが頻繁に進化する動的な少数ショットテキスト分類では、これらのモデルの効果が低下します。
最近、大規模言語モデル (LLM) は、その広範な事前トレーニングと文脈の理解により有望であることが示されています。
現在のアプローチは、LLM にテキスト入力、候補ラベル、およびテキストラベルを予測するための追加のサイド情報 (説明など) を提供します。
ただし、その有効性は、入力サイズの増加と副次的な情報処理によって導入されるノイズによって妨げられます。
これらの制限に対処するために、動的少数ショットテキスト分類のためのグラフベースのオンライン検索拡張生成フレームワーク、つまり GORAG を提案します。
GORAG は、各入力を個別に処理するのではなく、すべてのターゲットテキストにわたるサイド情報を抽出することにより、適応情報グラフを構築および維持します。
重み付けエッジメカニズムを採用して、抽出された情報の重要性と信頼性を優先し、テキスト入力ごとに調整された最小コストのスパニングツリーを使用して関連するコンテキストを動的に取得します。
経験的評価により、GORAG はより包括的で正確なコンテキスト情報を提供することで、既存のアプローチよりも優れたパフォーマンスを発揮することが実証されています。

要約(オリジナル)

Text classification is a fundamental task in natural language processing, pivotal to various applications such as query optimization, data integration, and schema matching. While neural network-based models, such as CNN and BERT, have demonstrated remarkable performance in text classification, their effectiveness heavily relies on abundant labeled training data. This dependency makes these models less effective in dynamic few-shot text classification, where labeled data is scarce, and target labels frequently evolve based on application needs. Recently, large language models (LLMs) have shown promise due to their extensive pretraining and contextual understanding. Current approaches provide LLMs with text inputs, candidate labels, and additional side information (e.g., descriptions) to predict text labels. However, their effectiveness is hindered by the increased input size and the noise introduced through side information processing. To address these limitations, we propose a graph-based online retrieval-augmented generation framework, namely GORAG, for dynamic few-shot text classification. GORAG constructs and maintains an adaptive information graph by extracting side information across all target texts, rather than treating each input independently. It employs a weighted edge mechanism to prioritize the importance and reliability of extracted information and dynamically retrieves relevant context using a minimum-cost spanning tree tailored for each text input. Empirical evaluations demonstrate that GORAG outperforms existing approaches by providing more comprehensive and accurate contextual information.

arxiv情報

著者	Yubo Wang,Haoyang Li,Fei Teng,Lei Chen
発行日	2025-01-06 08:43:31+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Graph-based Retrieval Augmented Generation for Dynamic Few-shot Text Classification

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー