A Question-Answering Approach to Key Value Pair Extraction from Form-like Document Images

要約

タイトル：フォーム様の文書画像からキーバリュー・ペアの抽出をするための質問応答手法

要約：

– 新しい質問応答(QA)ベースのキーバリュー(KVP)ペア抽出手法、KVPFormerを提案する。
– KVPFormerは、Transformerエンコーダーを用いて画像の全エンティティからキーエンティティを最初に特定し、これらのキーエンティティを「質問」として扱い、それらをTransformerデコーダーに送信し、並列に対応する「回答」(つまり、値エンティティ) を予測することで、エンティティ間のキーバリュー関係を確実に抽出する。
– より高い正答率を達成するために、おおまかな段階で個々の識別された質問に複数の回答候補を抽出し、細かい段階でこれらの候補の中から最も可能性の高いものを選択する、という粗いから細かい回答予測手法を提案する。
– さらに、エンティティ間の空間相互作用をより適切にモデル化するために、自己注意/クロス注意機構に空間適合性のアテンションバイアスを導入する。
– これらの新しい技術により、提案された手法は、FUNSDとXFUNDデータセットで最先端の結果を達成し、以前の最高パフォーマンス手法に比べて、F1スコアでそれぞれ7.2%、13.2%以上優れている。

要約(オリジナル)

In this paper, we present a new question-answering (QA) based key-value pair extraction approach, called KVPFormer, to robustly extracting key-value relationships between entities from form-like document images. Specifically, KVPFormer first identifies key entities from all entities in an image with a Transformer encoder, then takes these key entities as \textbf{questions} and feeds them into a Transformer decoder to predict their corresponding \textbf{answers} (i.e., value entities) in parallel. To achieve higher answer prediction accuracy, we propose a coarse-to-fine answer prediction approach further, which first extracts multiple answer candidates for each identified question in the coarse stage and then selects the most likely one among these candidates in the fine stage. In this way, the learning difficulty of answer prediction can be effectively reduced so that the prediction accuracy can be improved. Moreover, we introduce a spatial compatibility attention bias into the self-attention/cross-attention mechanism for \Ours{} to better model the spatial interactions between entities. With these new techniques, our proposed \Ours{} achieves state-of-the-art results on FUNSD and XFUND datasets, outperforming the previous best-performing method by 7.2\% and 13.2\% in F1 score, respectively.

arxiv情報

著者	Kai Hu,Zhuoyuan Wu,Zhuoyao Zhong,Weihong Lin,Lei Sun,Qiang Huo
発行日	2023-04-17 02:55:31+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

A Question-Answering Approach to Key Value Pair Extraction from Form-like Document Images

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー