STEP — Towards Structured Scene-Text Spotting

要約

構造化されたシーンテキストスポッティングタスクを導入します。これには、シーンテキスト OCR システムがクエリ正規表現に従って実際のテキストを検出する必要があります。
一般的なシーンテキスト OCR とは対照的に、構造化されたシーンテキストスポッティングは、ユーザーが指定した正規表現に基づいてシーンテキストの検出と認識の両方を動的に条件付けしようとします。
このタスクに取り組むために、提供されたテキスト構造を利用して OCR プロセスをガイドするモデルである Structured TExt sPotter (STEP) を提案します。
STEP はスペースを含む正規表現を処理でき、単語レベルの粒度での検出に制限されません。
私たちのアプローチは、現実世界のさまざまな読書シナリオにおいて正確なゼロショット構造化テキストスポッティングを可能にし、公開されているデータのみでトレーニングされています。
私たちのアプローチの有効性を実証するために、価格、日付、シリアル番号、ナンバープレートなどの分野の重要な読み取りアプリケーションを反映する、数種類の語彙外の構造化テキストを含む、新しい挑戦的なテストデータセットを導入します。
STEP は、テストされたすべてのシナリオで、オンデマンドに特化した OCR パフォーマンスを提供できます。

要約(オリジナル)

We introduce the structured scene-text spotting task, which requires a scene-text OCR system to spot text in the wild according to a query regular expression. Contrary to generic scene text OCR, structured scene-text spotting seeks to dynamically condition both scene text detection and recognition on user-provided regular expressions. To tackle this task, we propose the Structured TExt sPotter (STEP), a model that exploits the provided text structure to guide the OCR process. STEP is able to deal with regular expressions that contain spaces and it is not bound to detection at the word-level granularity. Our approach enables accurate zero-shot structured text spotting in a wide variety of real-world reading scenarios and is solely trained on publicly available data. To demonstrate the effectiveness of our approach, we introduce a new challenging test dataset that contains several types of out-of-vocabulary structured text, reflecting important reading applications of fields such as prices, dates, serial numbers, license plates etc. We demonstrate that STEP can provide specialised OCR performance on demand in all tested scenarios.

arxiv情報

著者	Sergi Garcia-Bordils,Dimosthenis Karatzas,Marçal Rusiñol
発行日	2023-09-05 16:11:54+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

STEP — Towards Structured Scene-Text Spotting

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー