Enhancing Multi-modal and Multi-hop Question Answering via Structured Knowledge and Unified Retrieval-Generation


この制限に対処するために、構造化された知識と統合検索生成ベースの方法 (SKURG) を提案します。
WebQA と MultimodalQA という 2 つのマルチモーダルおよびマルチホップ データセットで実験を行います。
この結果は、SKURG が検索と回答生成の両方で最先端のパフォーマンスを達成していることを示しています。


Multi-modal and multi-hop question answering aims to answer a question based on multiple input sources from different modalities. Previous methods retrieve the evidence separately and feed the retrieved evidence to a language model to generate the corresponding answer. However, these methods fail to build connections between candidates and thus cannot model the inter-dependent relation during retrieval. Moreover, the reasoning process over multi-modality candidates can be unbalanced without building alignments between different modalities. To address this limitation, we propose a Structured Knowledge and Unified Retrieval Generation based method (SKURG). We align the sources from different modalities via the shared entities and map them into a shared semantic space via structured knowledge. Then, we utilize a unified retrieval-generation decoder to integrate intermediate retrieval results for answer generation and adaptively determine the number of retrieval steps. We perform experiments on two multi-modal and multi-hop datasets: WebQA and MultimodalQA. The results demonstrate that SKURG achieves state-of-the-art performance on both retrieval and answer generation.


著者 Qian Yang,Qian Chen,Wen Wang,Baotian Hu,Min Zhang
発行日 2022-12-16 18:12:04+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CL, cs.CV パーマリンク