Dynamic Relation Transformer for Contextual Text Block Detection

要約

コンテキストテキストブロック検出 (CTBD) は、複雑な自然シーンの中で一貫したテキストブロックを識別するタスクです。
これまでの方法論では、CTBD をコンピュータービジョン内の視覚的関係抽出の課題として、または自然言語処理の観点からのシーケンスモデリングの問題として扱ってきました。
CTBD をグラフ生成問題として組み立てる新しいフレームワークを紹介します。
この方法論は 2 つの重要な手順で構成されます。1 つは個々のテキスト単位をグラフノードとして識別すること、もう 1 つはこれらの単位間の連続した読み取り順序関係をグラフエッジとして識別することです。
ノード検出に DQ-DETR の最先端機能を活用することで、当社のフレームワークは、エッジ生成専用の新しいメカニズムである Dynamic Relation Transformer (DRFormer) を統合することでさらに革新されます。
DRFormer には、動的グラフ構造改善プロセスを巧みに管理するデュアルインタラクティブトランスフォーマーデコーダが組み込まれています。
この反復プロセスを通じて、モデルは体系的にグラフの忠実度を高め、最終的にはコンテキストテキストブロックの検出精度が向上します。
SCUT-CTW-Context データセットと ReCTS-Context データセットの両方に対して行われた包括的な実験評価により、私たちの手法が最先端の結果を達成することが実証され、CTBD 分野の進歩におけるグラフ生成フレームワークの有効性と可能性が強調されています。

要約(オリジナル)

Contextual Text Block Detection (CTBD) is the task of identifying coherent text blocks within the complexity of natural scenes. Previous methodologies have treated CTBD as either a visual relation extraction challenge within computer vision or as a sequence modeling problem from the perspective of natural language processing. We introduce a new framework that frames CTBD as a graph generation problem. This methodology consists of two essential procedures: identifying individual text units as graph nodes and discerning the sequential reading order relationships among these units as graph edges. Leveraging the cutting-edge capabilities of DQ-DETR for node detection, our framework innovates further by integrating a novel mechanism, a Dynamic Relation Transformer (DRFormer), dedicated to edge generation. DRFormer incorporates a dual interactive transformer decoder that deftly manages a dynamic graph structure refinement process. Through this iterative process, the model systematically enhances the graph’s fidelity, ultimately resulting in improved precision in detecting contextual text blocks. Comprehensive experimental evaluations conducted on both SCUT-CTW-Context and ReCTS-Context datasets substantiate that our method achieves state-of-the-art results, underscoring the effectiveness and potential of our graph generation framework in advancing the field of CTBD.

arxiv情報

著者	Jiawei Wang,Shunchi Zhang,Kai Hu,Chixiang Ma,Zhuoyao Zhong,Lei Sun,Qiang Huo
発行日	2024-01-17 14:17:59+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Dynamic Relation Transformer for Contextual Text Block Detection

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー