LongFin: A Multimodal Document Understanding Model for Long Financial Domain Documents

要約

Document AI は、日々の業務運営をより効率的にするために、スキャンされたデジタル文書からの情報の理解と抽出に焦点を当てた成長中の研究分野です。
領収書やスキャンされたフォームなどのさまざまな種類の文書から情報を解析して抽出できる AI モデルのトレーニングを促進するために、多数の下流タスクとデータセットが導入されました。
これらの進歩にもかかわらず、既存のデータセットとモデルはどちらも、産業の文脈で生じる重大な課題に対処できていません。
既存のデータセットは主に 1 ページで構成される短いドキュメントで構成されていますが、既存のモデルは最大長が制限されており、多くの場合 512 トークンに設定されています。
その結果、文書が複数ページにわたる可能性がある金融サービスにおけるこれらの方法の実際の適用は、著しく妨げられている。
これらの課題を克服するために、最大 4K トークンをエンコードできるマルチモーダルドキュメント AI モデルである LongFin を導入します。
また、財務文書におけるいくつかの業界の課題をカプセル化した包括的な財務データセットである LongForms データセットも提案します。
広範な評価を通じて、LongForms データセットにおける LongFin モデルの有効性を実証し、既存の単一ページベンチマークで同等の結果を維持しながら、既存の公開モデルのパフォーマンスを上回りました。

要約(オリジナル)

Document AI is a growing research field that focuses on the comprehension and extraction of information from scanned and digital documents to make everyday business operations more efficient. Numerous downstream tasks and datasets have been introduced to facilitate the training of AI models capable of parsing and extracting information from various document types such as receipts and scanned forms. Despite these advancements, both existing datasets and models fail to address critical challenges that arise in industrial contexts. Existing datasets primarily comprise short documents consisting of a single page, while existing models are constrained by a limited maximum length, often set at 512 tokens. Consequently, the practical application of these methods in financial services, where documents can span multiple pages, is severely impeded. To overcome these challenges, we introduce LongFin, a multimodal document AI model capable of encoding up to 4K tokens. We also propose the LongForms dataset, a comprehensive financial dataset that encapsulates several industrial challenges in financial documents. Through an extensive evaluation, we demonstrate the effectiveness of the LongFin model on the LongForms dataset, surpassing the performance of existing public models while maintaining comparable results on existing single-page benchmarks.

arxiv情報

著者	Ahmed Masry,Amir Hajian
発行日	2024-01-26 18:23:45+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

LongFin: A Multimodal Document Understanding Model for Long Financial Domain Documents

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー