Cross-view Semantic Alignment for Livestreaming Product Recognition

要約

ライブコマースとは、ライブストリーミングを通じてオンラインで商品を販売する行為です。
オンライン製品に対する顧客の多様な要求により、ライブストリーミング製品の認識にはさらなる課題が生じています。
これまでの研究は主にファッション衣料品データに焦点を当てていたか、単一モーダル入力を利用していましたが、これはさまざまなカテゴリのマルチモーダルデータが存在する現実世界のシナリオを反映していませんでした。
この論文では、34 のカテゴリをカバーし、3 つのモダリティ (画像、ビデオ、テキスト) で構成され、50 個のカテゴリをカバーする大規模なマルチモーダルデータセットである LPR4M を紹介します。
一般に公開されている最大のデータセットよりも大きい。
LPR4M には、現実世界の問題に似たロングテール分布を示しながら、多様なビデオとノイズモダリティのペアが含まれています。
さらに、製品の画像およびビデオビューから識別的なインスタンスの特徴を学習するために、クロスビューセマンティックアライメント（RICE）モデルが提案されています。
これは、インスタンスレベルの対比学習とクロスビューのパッチレベルの特徴伝播によって実現されます。
新しいパッチ特徴再構成損失は、クロスビューパッチ間の意味論的な不整合にペナルティを与えるために提案されています。
広範な実験により、RICE の有効性が実証され、データセットの多様性と表現力の重要性についての洞察が得られます。
データセットとコードは https://github.com/adxcreative/RICE で入手できます。

要約(オリジナル)

Live commerce is the act of selling products online through live streaming. The customer’s diverse demands for online products introduce more challenges to Livestreaming Product Recognition. Previous works have primarily focused on fashion clothing data or utilize single-modal input, which does not reflect the real-world scenario where multimodal data from various categories are present. In this paper, we present LPR4M, a large-scale multimodal dataset that covers 34 categories, comprises 3 modalities (image, video, and text), and is 50? larger than the largest publicly available dataset. LPR4M contains diverse videos and noise modality pairs while exhibiting a long-tailed distribution, resembling real-world problems. Moreover, a cRoss-vIew semantiC alignmEnt (RICE) model is proposed to learn discriminative instance features from the image and video views of the products. This is achieved through instance-level contrastive learning and cross-view patch-level feature propagation. A novel Patch Feature Reconstruction loss is proposed to penalize the semantic misalignment between cross-view patches. Extensive experiments demonstrate the effectiveness of RICE and provide insights into the importance of dataset diversity and expressivity. The dataset and code are available at https://github.com/adxcreative/RICE

arxiv情報

著者	Wenjie Yang,Yiyi Chen,Yan Li,Yanhua Cheng,Xudong Liu,Quan Chen,Han Li
発行日	2023-08-09 12:23:41+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Cross-view Semantic Alignment for Livestreaming Product Recognition

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー