Rethinking Cooking State Recognition with Vision Transformers

要約

キッチン環境の適切な知識表現を確実にするためには、調理中の食品の状態をキッチンロボットが認識することが不可欠です。
オブジェクトの検出と認識の分野は広く研究されていますが、オブジェクトの状態分類のタスクは比較的未開拓のままです。
調理のさまざまな段階での材料のクラス内類似性が高いため、この作業はさらに困難になります。
最近、研究者はディープラーニングベースの戦略を採用することを提案していますが、まだ高いパフォーマンスを達成していません.
この研究では、調理状態認識タスクにビジョントランスフォーマー (ViT) アーキテクチャの自己注意メカニズムを利用しました。
提案されたアプローチは、より大きなデータセットから学習した重みを活用しながら、画像からグローバルに顕著な特徴をカプセル化します。
このグローバルな注意により、モデルはさまざまな調理オブジェクトのサンプル間の類似性に耐えることができますが、転移学習の採用により、事前トレーニングされた重みを利用することで誘導バイアスの欠如を克服することができます。
認識精度を向上させるために、いくつかの拡張技術も採用されています。
「調理状態認識チャレンジデータセット」に関する提案されたフレームワークの評価では、最新技術を大幅に上回る 94.3% の精度が達成されました。

要約(オリジナル)

To ensure proper knowledge representation of the kitchen environment, it is vital for kitchen robots to recognize the states of the food items that are being cooked. Although the domain of object detection and recognition has been extensively studied, the task of object state classification has remained relatively unexplored. The high intra-class similarity of ingredients during different states of cooking makes the task even more challenging. Researchers have proposed adopting Deep Learning based strategies in recent times, however, they are yet to achieve high performance. In this study, we utilized the self-attention mechanism of the Vision Transformer (ViT) architecture for the Cooking State Recognition task. The proposed approach encapsulates the globally salient features from images, while also exploiting the weights learned from a larger dataset. This global attention allows the model to withstand the similarities between samples of different cooking objects, while the employment of transfer learning helps to overcome the lack of inductive bias by utilizing pretrained weights. To improve recognition accuracy, several augmentation techniques have been employed as well. Evaluation of our proposed framework on the `Cooking State Recognition Challenge Dataset’ has achieved an accuracy of 94.3%, which significantly outperforms the state-of-the-art.

arxiv情報

著者	Akib Mohammed Khan,Alif Ashrafee,Reeshoon Sayera,Shahriar Ivan,Sabbir Ahmed
発行日	2022-12-16 17:06:28+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Rethinking Cooking State Recognition with Vision Transformers

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー