On Task-personalized Multimodal Few-shot Learning for Visually-rich Document Entity Retrieval

要約

請求書や領収書などの文書画像から重要な情報 (日付、住所など) を抽出するビジュアルリッチな文書実体検索 (VDER) は、産業用 NLP アプリケーションの重要なトピックとなっています。
それぞれに固有のエンティティタイプを持つ新しいドキュメントタイプが一定のペースで出現すると、特有の課題が生じます。多くのドキュメントには、数回しか発生しない未確認のエンティティタイプが含まれています。
この課題に対処するには、モデルが数回のショットでエンティティを学習する機能を備えている必要があります。
ただし、フューショット VDER のこれまでの取り組みは主に、事前定義されたグローバルエンティティスペースを使用してドキュメントレベルでの問題に対処しており、エンティティレベルのフューショットシナリオは考慮されていません。つまり、ターゲットエンティティタイプは各タスクとエンティティによってローカルにパーソナライズされています。
出現頻度は文書によって大きく異なります。
この未踏のシナリオに対処するために、この文書では、新しいエンティティレベルの少数ショット VDER タスクを研究します。
課題は、各タスクのラベルスペースの一意性と、配布外 (OOD) コンテンツの複雑さの増大にあります。
この新しいタスクに取り組むために、タスク内とタスク外の分散を区別する効果的なタスクのパーソナライゼーションを達成することに重点を置いた、タスクを認識したメタ学習ベースのフレームワークを紹介します。
具体的には、この目標を達成するために、階層デコーダ (HC) を採用し、対照学習 (ContrastProtoNet) を採用します。
さらに、エンティティレベルの少数ショット VDER の分野での将来の研究を促進するために、新しいデータセット FewVEX を導入します。
実験結果は、私たちのアプローチが一般的なメタ学習ベースラインの堅牢性を大幅に向上させることを示しています。

要約(オリジナル)

Visually-rich document entity retrieval (VDER), which extracts key information (e.g. date, address) from document images like invoices and receipts, has become an important topic in industrial NLP applications. The emergence of new document types at a constant pace, each with its unique entity types, presents a unique challenge: many documents contain unseen entity types that occur only a couple of times. Addressing this challenge requires models to have the ability of learning entities in a few-shot manner. However, prior works for Few-shot VDER mainly address the problem at the document level with a predefined global entity space, which doesn’t account for the entity-level few-shot scenario: target entity types are locally personalized by each task and entity occurrences vary significantly among documents. To address this unexplored scenario, this paper studies a novel entity-level few-shot VDER task. The challenges lie in the uniqueness of the label space for each task and the increased complexity of out-of-distribution (OOD) contents. To tackle this novel task, we present a task-aware meta-learning based framework, with a central focus on achieving effective task personalization that distinguishes between in-task and out-of-task distribution. Specifically, we adopt a hierarchical decoder (HC) and employ contrastive learning (ContrastProtoNet) to achieve this goal. Furthermore, we introduce a new dataset, FewVEX, to boost future research in the field of entity-level few-shot VDER. Experimental results demonstrate our approaches significantly improve the robustness of popular meta-learning baselines.

arxiv情報

著者	Jiayi Chen,Hanjun Dai,Bo Dai,Aidong Zhang,Wei Wei
発行日	2023-11-01 17:51:43+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

On Task-personalized Multimodal Few-shot Learning for Visually-rich Document Entity Retrieval

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー