Scene Text Recognition with Image-Text Matching-guided Dictionary

要約

タイトル：画像テキストマッチングによる辞書を用いたシーンテキスト認識

要約：

– 辞書を使用することで、シーンテキスト認識手法においてビジュアル予測とグラウンドトゥルースの偏差を効率的に修正することができる。
– ただし、辞書がビジュアル特徴に依存しているため、正確なビジュアル予測の修正が不正確になる可能性がある。
– 本論文では、Scene Image-Text Matching（SITM）ネットワークを活用した新しい辞書言語モデルを提案する。この新しいモデルは、明示的な辞書言語モデルの欠点である以下の点を避けることができる。
1. ビジュアル特徴の独立性
2. 候補のノイズの選択
– SITMネットワークは、推論段階での候補の中から、画像と対応するテキストを一致させるために、Image-Text Contrastive（ITC）Learningを使用することで、これを実現している。ITCは、ビジョンと言語の学習において、正の画像テキストペアを特徴空間で近づけるために広く使用されている。
– ITCに着想を得て、SITMネットワークは、すべての候補のビジュアル特徴とテキスト特徴を組み合わせ、特徴空間で最小の距離を持つ候補を識別する。
– 本手法は、6つの主流なベンチマークで、従来の手法に比べてより良い結果（93.8％の精度）を達成した。さらに、ABINetと本手法を統合し、いくつかのベンチマークで新しい最高水準の結果を確立した。

要約(オリジナル)

Employing a dictionary can efficiently rectify the deviation between the visual prediction and the ground truth in scene text recognition methods. However, the independence of the dictionary on the visual features may lead to incorrect rectification of accurate visual predictions. In this paper, we propose a new dictionary language model leveraging the Scene Image-Text Matching(SITM) network, which avoids the drawbacks of the explicit dictionary language model: 1) the independence of the visual features; 2) noisy choice in candidates etc. The SITM network accomplishes this by using Image-Text Contrastive (ITC) Learning to match an image with its corresponding text among candidates in the inference stage. ITC is widely used in vision-language learning to pull the positive image-text pair closer in feature space. Inspired by ITC, the SITM network combines the visual features and the text features of all candidates to identify the candidate with the minimum distance in the feature space. Our lexicon method achieves better results(93.8\% accuracy) than the ordinary method results(92.1\% accuracy) on six mainstream benchmarks. Additionally, we integrate our method with ABINet and establish new state-of-the-art results on several benchmarks.

arxiv情報

著者	Jiajun Wei,Hongjian Zhan,Xiao Tu,Yue Lu,Umapada Pal
発行日	2023-05-08 07:47:49+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

Scene Text Recognition with Image-Text Matching-guided Dictionary

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー