Advancing Food Nutrition Estimation via Visual-Ingredient Feature Fusion

要約

栄養推定は、健康的な食事を促進し、食事関連の健康リスクを軽減する重要な要素です。
食品分類や成分認識などのタスクの進歩にもかかわらず、栄養注釈付きのデータセットが不足しているため、栄養推定の進歩は限られています。
この問題に対処するために、908のファーストフードカテゴリに84,446個の画像を備えたデータセットであるFastFoodを紹介し、成分と栄養注釈を備えています。
さらに、視覚的および成分の特徴を統合することにより栄養推定を強化するために、新しいモデルに依存しない視覚的に関与した特徴融合（VIF $^2 $）メソッドを提案します。
成分の堅牢性は、トレーニング中に同義語の交換と再サンプリング戦略を通じて改善されます。
成分を認識した視覚特徴融合モジュールは、成分の特徴と視覚表現を組み合わせて、正確な栄養予測を実現します。
テスト中、成分の予測は、データの増強と多数決により、大きなマルチモーダルモデルを使用して洗練されます。
FastFoodとNutrition5Kデータセットの両方での実験は、さまざまなバックボーン（例：ResNet、InceptionV3、VIT）に組み込まれた提案方法の有効性を検証します。これは、栄養推定における成分情報の重要性を示しています。
https://huiyanqi.github.io/fastfood-nutrition-stimation/。

要約(オリジナル)

Nutrition estimation is an important component of promoting healthy eating and mitigating diet-related health risks. Despite advances in tasks such as food classification and ingredient recognition, progress in nutrition estimation is limited due to the lack of datasets with nutritional annotations. To address this issue, we introduce FastFood, a dataset with 84,446 images across 908 fast food categories, featuring ingredient and nutritional annotations. In addition, we propose a new model-agnostic Visual-Ingredient Feature Fusion (VIF$^2$) method to enhance nutrition estimation by integrating visual and ingredient features. Ingredient robustness is improved through synonym replacement and resampling strategies during training. The ingredient-aware visual feature fusion module combines ingredient features and visual representation to achieve accurate nutritional prediction. During testing, ingredient predictions are refined using large multimodal models by data augmentation and majority voting. Our experiments on both FastFood and Nutrition5k datasets validate the effectiveness of our proposed method built in different backbones (e.g., Resnet, InceptionV3 and ViT), which demonstrates the importance of ingredient information in nutrition estimation. https://huiyanqi.github.io/fastfood-nutrition-estimation/.

arxiv情報

著者	Huiyan Qi,Bin Zhu,Chong-Wah Ngo,Jingjing Chen,Ee-Peng Lim
発行日	2025-05-13 17:01:21+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Advancing Food Nutrition Estimation via Visual-Ingredient Feature Fusion

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー