New Benchmark Dataset and Fine-Grained Cross-Modal Fusion Framework for Vietnamese Multimodal Aspect-Category Sentiment Analysis

要約

ソーシャルメディアプラットフォーム上でのマルチモーダルデータの出現により、特定の側面に対するユーザーの感情をより深く理解する新たな機会が生まれました。
ただし、アスペクトカテゴリセンチメント分析 (ACSA) 用の既存のマルチモーダルデータセットは、多くの場合、テキストの注釈に焦点を当てており、画像内の詳細な情報は無視されています。
その結果、これらのデータセットは、マルチモーダルに固有の豊富さを十分に活用できません。
これに対処するために、ViMACSA という名前の新しいベトナムのマルチモーダルデータセットを導入します。このデータセットは、ホテルドメイン内のテキストと画像の両方に 14,618 個のきめ細かい注釈を備えた 4,876 個のテキストと画像のペアで構成されています。
さらに、モダリティ内およびモダリティ間の相互作用の両方を効果的に学習し、これらの情報を融合して統一されたマルチモーダル表現を生成する、きめの細かいクロスモーダル融合フレームワーク (FCMF) を提案します。
実験結果は、私たちのフレームワークが ViMACSA データセット上の SOTA モデルよりも優れたパフォーマンスを示し、最高の F1 スコア 79.73% を達成したことを示しています。
また、スペルミス、略語、ベトナム語の複雑さなど、ベトナム語の多様な感情分析における特徴と課題についても調査します。
この研究は、ベンチマークデータセットと、きめの細かいマルチモーダル情報を活用してマルチモーダルアスペクトカテゴリセンチメント分析を改善する新しいフレームワークの両方に貢献します。
私たちのデータセットは研究目的で利用できます: https://github.com/hoangquy18/Multimodal-Aspect-Category-Sentiment-Analysis。

要約(オリジナル)

The emergence of multimodal data on social media platforms presents new opportunities to better understand user sentiments toward a given aspect. However, existing multimodal datasets for Aspect-Category Sentiment Analysis (ACSA) often focus on textual annotations, neglecting fine-grained information in images. Consequently, these datasets fail to fully exploit the richness inherent in multimodal. To address this, we introduce a new Vietnamese multimodal dataset, named ViMACSA, which consists of 4,876 text-image pairs with 14,618 fine-grained annotations for both text and image in the hotel domain. Additionally, we propose a Fine-Grained Cross-Modal Fusion Framework (FCMF) that effectively learns both intra- and inter-modality interactions and then fuses these information to produce a unified multimodal representation. Experimental results show that our framework outperforms SOTA models on the ViMACSA dataset, achieving the highest F1 score of 79.73%. We also explore characteristics and challenges in Vietnamese multimodal sentiment analysis, including misspellings, abbreviations, and the complexities of the Vietnamese language. This work contributes both a benchmark dataset and a new framework that leverages fine-grained multimodal information to improve multimodal aspect-category sentiment analysis. Our dataset is available for research purposes: https://github.com/hoangquy18/Multimodal-Aspect-Category-Sentiment-Analysis.

arxiv情報

著者	Quy Hoang Nguyen,Minh-Van Truong Nguyen,Kiet Van Nguyen
発行日	2024-05-01 14:29:03+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

New Benchmark Dataset and Fine-Grained Cross-Modal Fusion Framework for Vietnamese Multimodal Aspect-Category Sentiment Analysis

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー