A Permuted Autoregressive Approach to Word-Level Recognition for Urdu Digital Text

要約

この研究論文では、デジタルウルドゥー語テキスト向けに特別に設計された新しい単語レベルの光学式文字認識 (OCR) モデルを紹介します。これは、トランスフォーマーベースのアーキテクチャとアテンションメカニズムを活用して、多様なテキストスタイル、フォント、文字認識などのウルドゥー語文字認識の明確な課題に対処します。
バリエーション。
このモデルは、置換自己回帰シーケンス (PARSeq) アーキテクチャを採用しており、複数のトークンの置換のトレーニングを通じてコンテキストを認識した推論と反復改良を可能にすることでパフォーマンスを向上させます。
この方法により、モデルはウルドゥー語のスクリプトでよく見られる文字の並べ替えと文字の重なりを適切に管理できます。
約 160,000 個のウルドゥー語テキスト画像で構成されるデータセットでトレーニングされたこのモデルは、ウルドゥー語文字の複雑さを捉える高レベルの精度を示し、CER 0.178 を達成しました。
特定のテキストのバリエーションを処理する際には継続的な課題がありますが、このモデルは実際のアプリケーションにおいて優れた精度と有効性を示します。
今後の作業は、ウルドゥー語テキスト認識におけるパフォーマンスと堅牢性をさらに強化するために、高度なデータ拡張技術とコンテキスト認識型言語モデルの統合を通じてモデルを改良することに焦点を当てます。

要約(オリジナル)

This research paper introduces a novel word-level Optical Character Recognition (OCR) model specifically designed for digital Urdu text, leveraging transformer-based architectures and attention mechanisms to address the distinct challenges of Urdu script recognition, including its diverse text styles, fonts, and variations. The model employs a permuted autoregressive sequence (PARSeq) architecture, which enhances its performance by enabling context-aware inference and iterative refinement through the training of multiple token permutations. This method allows the model to adeptly manage character reordering and overlapping characters, commonly encountered in Urdu script. Trained on a dataset comprising approximately 160,000 Urdu text images, the model demonstrates a high level of accuracy in capturing the intricacies of Urdu script, achieving a CER of 0.178. Despite ongoing challenges in handling certain text variations, the model exhibits superior accuracy and effectiveness in practical applications. Future work will focus on refining the model through advanced data augmentation techniques and the integration of context-aware language models to further enhance its performance and robustness in Urdu text recognition.

arxiv情報

著者	Ahmed Mustafa,Muhammad Tahir Rafique,Muhammad Ijlal Baig,Hasan Sajid,Muhammad Jawad Khan,Karam Dad Kallu
発行日	2024-08-30 15:29:08+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

A Permuted Autoregressive Approach to Word-Level Recognition for Urdu Digital Text

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー