Mixed Text Recognition with Efficient Parameter Fine-Tuning and Transformer

要約

OCRテクノロジーの急速な発展に伴い、混合シーンテキスト認識が重要な技術的課題となっています。
ディープラーニングモデルは特定のシナリオで重要な結果を達成していますが、その一般性と安定性は依然として改善が必要であり、コンピューティングリソースの需要が高いことは柔軟性に影響します。
これらの問題に対処するために、このペーパーでは、事前に訓練されたOCRトランスに基づいたパラメーター効率の高いハイブリッドテキストスポッティング方法であるDlora-Trocrを提案します。
画像エンコーダーに重量分散ドラモジュールを埋め込み、テキストデコーダーにLORAモジュールを埋め込むことにより、この方法は、さまざまな下流タスクで効率的に微調整できます。
私たちの方法には、トレーニング効率を加速するだけでなく、混合テキストシーンにおけるOCRシステムの認識精度とクロスダタセット一般化パフォーマンスを大幅に改善するだけでなく、0.7 \％トレーニング可能なパラメーターを超えていません。
実験は、提案されたDlora-Trocrが、手書き、印刷、ストリートテキストが混在している複雑なシーンを認識し、IAMデータセットで4.02のCERを達成し、SROIEデータセットで94.29のF1スコアを達成し、STRベンチマークでのARTの86.70の戦争を達成することを実験しています。

要約(オリジナル)

With the rapid development of OCR technology, mixed-scene text recognition has become a key technical challenge. Although deep learning models have achieved significant results in specific scenarios, their generality and stability still need improvement, and the high demand for computing resources affects flexibility. To address these issues, this paper proposes DLoRA-TrOCR, a parameter-efficient hybrid text spotting method based on a pre-trained OCR Transformer. By embedding a weight-decomposed DoRA module in the image encoder and a LoRA module in the text decoder, this method can be efficiently fine-tuned on various downstream tasks. Our method requires no more than 0.7\% trainable parameters, not only accelerating the training efficiency but also significantly improving the recognition accuracy and cross-dataset generalization performance of the OCR system in mixed text scenes. Experiments show that our proposed DLoRA-TrOCR outperforms other parameter-efficient fine-tuning methods in recognizing complex scenes with mixed handwritten, printed, and street text, achieving a CER of 4.02 on the IAM dataset, a F1 score of 94.29 on the SROIE dataset, and a WAR of 86.70 on the STR Benchmark, reaching state-of-the-art performance.

arxiv情報

著者	Da Chang,Yu Li
発行日	2025-05-09 09:14:25+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Mixed Text Recognition with Efficient Parameter Fine-Tuning and Transformer

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー