ViMRHP: A Vietnamese Benchmark Dataset for Multimodal Review Helpfulness Prediction via Human-AI Collaborative Annotation

要約

マルチモーダルレビューの有用性予測（MRHP）は、特にeコマースプラットフォームでの推奨システムに不可欠なタスクです。
ユーザーが生成したレビューの有用性を判断すると、ユーザーエクスペリエンスが向上し、消費者の意思決定が向上します。
しかし、既存のデータセットは主に英語とインドネシアに焦点を当てており、特にベトナム語などの低リソース言語では言語の多様性が不足しています。
この論文では、ベトナムのMRHPタスクの大規模なベンチマークデータセットであるVIMRHP（ベトナムのマルチモーダルレビューの有用性予測）を紹介します。
このデータセットは、46Kレビューの2K製品を含む4つのドメインをカバーしています。
一方、大規模なデータセットにはかなりの時間とコストが必要です。
注釈プロセスを最適化するために、AIを活用して、AnotatorがVIMRHPデータセットの構築を支援します。
AIの支援により、注釈時間はデータの品質を維持し、全体的なコストを約65％削減しながら、注釈時間が短縮されます（タスクあたり90〜120秒減少します）。
ただし、AIに生成された注釈には、複雑な注釈タスクにはまだ制限があり、詳細なパフォーマンス分析を通じてさらに調べます。
VIMRHPでの実験では、ヒトで検証されたAIに生成された注釈のベースラインモデルを評価して、それらの品質の違いを評価します。
VIMRHPデータセットは、https：//github.com/trng28/vimrhpで公開されています

要約(オリジナル)

Multimodal Review Helpfulness Prediction (MRHP) is an essential task in recommender systems, particularly in E-commerce platforms. Determining the helpfulness of user-generated reviews enhances user experience and improves consumer decision-making. However, existing datasets focus predominantly on English and Indonesian, resulting in a lack of linguistic diversity, especially for low-resource languages such as Vietnamese. In this paper, we introduce ViMRHP (Vietnamese Multimodal Review Helpfulness Prediction), a large-scale benchmark dataset for MRHP task in Vietnamese. This dataset covers four domains, including 2K products with 46K reviews. Meanwhile, a large-scale dataset requires considerable time and cost. To optimize the annotation process, we leverage AI to assist annotators in constructing the ViMRHP dataset. With AI assistance, annotation time is reduced (90 to 120 seconds per task down to 20 to 40 seconds per task) while maintaining data quality and lowering overall costs by approximately 65%. However, AI-generated annotations still have limitations in complex annotation tasks, which we further examine through a detailed performance analysis. In our experiment on ViMRHP, we evaluate baseline models on human-verified and AI-generated annotations to assess their quality differences. The ViMRHP dataset is publicly available at https://github.com/trng28/ViMRHP

arxiv情報

著者	Truc Mai-Thanh Nguyen,Dat Minh Nguyen,Son T. Luu,Kiet Van Nguyen
発行日	2025-05-12 10:11:28+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

ViMRHP: A Vietnamese Benchmark Dataset for Multimodal Review Helpfulness Prediction via Human-AI Collaborative Annotation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー