Zero-Shot Prompting and Few-Shot Fine-Tuning: Revisiting Document Image Classification Using Large Language Models

要約

スキャンした文書の分類は、文書を理解するための画像、レイアウト、テキストの分析を伴う難しい問題です。
それにもかかわらず、特定のベンチマークデータセット、特に RVL-CDIP では、数十万のトレーニングサンプルを考慮すると、最先端のパフォーマンスがほぼ完璧に近づいています。
優れた少数回学習器である大規模言語モデル (LLM) の出現により、わずかなトレーニングサンプルだけで、またはまったくトレーニングサンプルを使用せずに文書分類問題にどの程度対処できるかという疑問が生じます。
このペーパーでは、人間による注釈付きトレーニングサンプルの必要性をできる限り減らすことを目的として、ゼロショットプロンプティングと少数ショットモデルの微調整のコンテキストでこの問題を調査します。

要約(オリジナル)

Classifying scanned documents is a challenging problem that involves image, layout, and text analysis for document understanding. Nevertheless, for certain benchmark datasets, notably RVL-CDIP, the state of the art is closing in to near-perfect performance when considering hundreds of thousands of training samples. With the advent of large language models (LLMs), which are excellent few-shot learners, the question arises to what extent the document classification problem can be addressed with only a few training samples, or even none at all. In this paper, we investigate this question in the context of zero-shot prompting and few-shot model fine-tuning, with the aim of reducing the need for human-annotated training samples as much as possible.

arxiv情報

著者	Anna Scius-Bertrand,Michael Jungo,Lars Vögtlin,Jean-Marc Spat,Andreas Fischer
発行日	2024-12-18 13:53:16+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Zero-Shot Prompting and Few-Shot Fine-Tuning: Revisiting Document Image Classification Using Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー