TPFNet: A Novel Text In-painting Transformer for Text Removal

要約

画像からのテキスト消去は、画像の編集やプライバシーの保護など、さまざまなタスクに役立ちます。
このホワイトペーパーでは、画像からテキストを削除するための新しい 1 段階 (エンドツーエンド) ネットワークである TPFNet を紹介します。
ネットワークには、特徴合成と画像生成の 2 つの部分があります。
ノイズは低解像度のイメージからより効果的に除去できるため、パート 1 は低解像度のイメージで動作します。
パート 1 の出力は、低解像度のテキストなしの画像です。
パート 2 では、パート 1 で学習した特徴を使用して、テキストのない高解像度の画像を予測します。
パート 1 では、エンコーダーとして「ピラミッドビジョントランスフォーマー」(PVT) を使用します。
さらに、テキストのない画像に加えて、ハイパスフィルター処理された画像とセグメンテーションマップを生成する新しいマルチヘッドデコーダーを使用します。
セグメンテーションブランチはテキストを正確に見つけるのに役立ち、ハイパスブランチは画像構造の学習に役立ちます。
テキストを正確に特定するために、TPFNet は、入力画像ではなくセグメンテーションマップを条件とする敵対的損失を採用します。
Oxford、SCUT、および SCUT-EnsText データセットでは、私たちのネットワークは、ほぼすべてのメトリックで最近提案されたネットワークよりも優れています。
たとえば、SCUT-EnsText データセットでは、TPFNet は 39.0 の PSNR (高いほど良い) と 21.1 のテキスト検出精度 (低いほど良い) を持ち、PSNR が 32.3 で精度が 53.2 の最高の以前の手法と比較します。
.
ソースコードは https://github.com/CandleLabAI/TPFNet から入手できます。

要約(オリジナル)

Text erasure from an image is helpful for various tasks such as image editing and privacy preservation. In this paper, we present TPFNet, a novel one-stage (end-toend) network for text removal from images. Our network has two parts: feature synthesis and image generation. Since noise can be more effectively removed from low-resolution images, part 1 operates on low-resolution images. The output of part 1 is a low-resolution text-free image. Part 2 uses the features learned in part 1 to predict a high-resolution text-free image. In part 1, we use ‘pyramidal vision transformer’ (PVT) as the encoder. Further, we use a novel multi-headed decoder that generates a high-pass filtered image and a segmentation map, in addition to a text-free image. The segmentation branch helps locate the text precisely, and the high-pass branch helps in learning the image structure. To precisely locate the text, TPFNet employs an adversarial loss that is conditional on the segmentation map rather than the input image. On Oxford, SCUT, and SCUT-EnsText datasets, our network outperforms recently proposed networks on nearly all the metrics. For example, on SCUT-EnsText dataset, TPFNet has a PSNR (higher is better) of 39.0 and text-detection precision (lower is better) of 21.1, compared to the best previous technique, which has a PSNR of 32.3 and precision of 53.2. The source code can be obtained from https://github.com/CandleLabAI/TPFNet

arxiv情報

著者	Onkar Susladkar,Dhruv Makwana,Gayatri Deshmukh,Sparsh Mittal,Sai Chandra Teja R,Rekha Singhal
発行日	2022-10-27 14:14:55+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

TPFNet: A Novel Text In-painting Transformer for Text Removal

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー