SceneGATE: Scene-Graph based co-Attention networks for TExt visual question answering

要約

ほとんどの TextVQA アプローチは、単純なトランスフォーマーエンコーダーによるオブジェクト、シーンテキスト、疑問詞の統合に焦点を当てています。
しかし、これでは異なるモダリティ間の意味論的な関係を捉えることができません。
この論文では、TextVQA 用のシーングラフベースのコアテンションネットワーク (SceneGATE) を提案しています。これにより、オブジェクト、光学式文字認識 (OCR) トークン、質問語の間の意味関係が明らかになります。
これは、画像の基礎となるセマンティクスを発見する TextVQA ベースのシーングラフによって実現されます。
私たちは、モーダル間相互作用のガイダンスとして、言語と視覚の間のモーダル内相互作用を捕捉するためのガイド付き注意モジュールを作成しました。
2つのモダリティ間の関係を明示的に教えるために、シーングラフベースの意味関係を意識した注意と位置関係を意識した注意という2つの注意モジュールを提案し、統合しました。
Text-VQA と ST-VQA という 2 つのベンチマークデータセットに対して広範な実験を実施しました。
シーングラフとそのアテンションモジュールのおかげで、SceneGATE メソッドが既存のメソッドよりも優れていることが示されています。

要約(オリジナル)

Most TextVQA approaches focus on the integration of objects, scene texts and question words by a simple transformer encoder. But this fails to capture the semantic relations between different modalities. The paper proposes a Scene Graph based co-Attention Network (SceneGATE) for TextVQA, which reveals the semantic relations among the objects, Optical Character Recognition (OCR) tokens and the question words. It is achieved by a TextVQA-based scene graph that discovers the underlying semantics of an image. We created a guided-attention module to capture the intra-modal interplay between the language and the vision as a guidance for inter-modal interactions. To make explicit teaching of the relations between the two modalities, we proposed and integrated two attention modules, namely a scene graph-based semantic relation-aware attention and a positional relation-aware attention. We conducted extensive experiments on two benchmark datasets, Text-VQA and ST-VQA. It is shown that our SceneGATE method outperformed existing ones because of the scene graph and its attention modules.

arxiv情報

著者	Feiqi Cao,Siwen Luo,Felipe Nunez,Zean Wen,Josiah Poon,Caren Han
発行日	2023-08-07 08:32:54+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

SceneGATE: Scene-Graph based co-Attention networks for TExt visual question answering

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー