X-Reflect: Cross-Reflection Prompting for Multimodal Recommendation

要約

大規模言語モデル (LLM) と大規模マルチモーダルモデル (LMM) は、商品説明を充実させる効果を高め、それによってレコメンデーションシステムの精度を向上させることが示されています。
しかし、既存のアプローチのほとんどは、テキストのみのプロンプトに依存するか、テキストと視覚の両方のモダリティから得られる補完的な情報を十分に活用していない基本的なマルチモーダル戦略を採用しています。
この論文では、X-Reflect と呼ばれる新しいフレームワークである Cross-Reflection Prompting を紹介します。このフレームワークは、LMM にテキストと画像の間で支持的な情報と矛盾する情報を明示的に識別して調整するよう促すことで、これらの制限に対処するように設計されています。
このアプローチは、両方のモダリティから微妙な洞察を取得することにより、より包括的でコンテキストに富んだアイテム表現を生成します。
広く使用されている 2 つのベンチマークで行われた広範な実験により、私たちの方法が下流の推奨精度において既存のプロンプトベースラインよりも優れていることが実証されました。
さらに、さまざまな LMM バックボーンにわたるフレームワークの汎用性とプロンプト戦略の堅牢性を評価し、最適化のための洞察を提供します。
この研究は、マルチモーダル情報を統合することの重要性を強調し、マルチモーダルレコメンデーションシステムにおけるアイテム理解を向上させるための新しいソリューションを提示します。

要約(オリジナル)

Large Language Models (LLMs) and Large Multimodal Models (LMMs) have been shown to enhance the effectiveness of enriching item descriptions, thereby improving the accuracy of recommendation systems. However, most existing approaches either rely on text-only prompting or employ basic multimodal strategies that do not fully exploit the complementary information available from both textual and visual modalities. This paper introduces a novel framework, Cross-Reflection Prompting, termed X-Reflect, designed to address these limitations by prompting LMMs to explicitly identify and reconcile supportive and conflicting information between text and images. By capturing nuanced insights from both modalities, this approach generates more comprehensive and contextually richer item representations. Extensive experiments conducted on two widely used benchmarks demonstrate that our method outperforms existing prompting baselines in downstream recommendation accuracy. Additionally, we evaluate the generalizability of our framework across different LMM backbones and the robustness of the prompting strategies, offering insights for optimization. This work underscores the importance of integrating multimodal information and presents a novel solution for improving item understanding in multimodal recommendation systems.

arxiv情報

著者	Hanjia Lyu,Ryan Rossi,Xiang Chen,Md Mehrab Tanjim,Stefano Petrangeli,Somdeb Sarkhel,Jiebo Luo
発行日	2024-08-27 16:10:21+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

X-Reflect: Cross-Reflection Prompting for Multimodal Recommendation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー