On Analyzing the Role of Image for Visual-enhanced Relation Extraction

要約

マルチモーダルな関係抽出は、ナレッジグラフの構築に不可欠なタスクです。
このホワイトペーパーでは、ビジュアルシーングラフの不正確な情報が不十分なモーダルアライメントウェイトにつながり、パフォーマンスがさらに低下することを示す詳細な経験的分析を行います。
さらに、視覚シャッフルの実験は、現在のアプローチでは視覚情報を十分に活用できない可能性があることを示しています。
上記の観察に基づいて、マルチモーダル関係抽出用の Transformer に基づく暗黙的な細粒度マルチモーダルアライメントを備えた強力なベースラインをさらに提案します。
実験結果は、私たちの方法のより良いパフォーマンスを示しています。
コードは https://github.com/zjunlp/DeepKE/tree/main/example/re/multimodal で入手できます。

要約(オリジナル)

Multimodal relation extraction is an essential task for knowledge graph construction. In this paper, we take an in-depth empirical analysis that indicates the inaccurate information in the visual scene graph leads to poor modal alignment weights, further degrading performance. Moreover, the visual shuffle experiments illustrate that the current approaches may not take full advantage of visual information. Based on the above observation, we further propose a strong baseline with an implicit fine-grained multimodal alignment based on Transformer for multimodal relation extraction. Experimental results demonstrate the better performance of our method. Codes are available at https://github.com/zjunlp/DeepKE/tree/main/example/re/multimodal.

arxiv情報

著者	Lei Li,Xiang Chen,Shuofei Qiao,Feiyu Xiong,Huajun Chen,Ningyu Zhang
発行日	2022-11-14 16:39:24+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

On Analyzing the Role of Image for Visual-enhanced Relation Extraction

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー