Towards Visual Affordance Learning: A Benchmark for Affordance Segmentation and Recognition

要約

オブジェクトの物理的およびテクスチャ的属性は、コンピュータビジョンにおける認識、検出、およびセグメンテーションのタスクのために広く研究されてきました。~大規模な ImageNet などの多くのデータセットが、データを大量に消費するディープニューラルネットワークを使用した特徴学習や手動の学習のために提案されています。
巧みに作られた特徴抽出。
オブジェクトとインテリジェントに対話するために、ロボットやインテリジェントマシンは、従来の物理的/テクスチャ的属性を超えて推論し、アフォーダンスの認識、検出、セグメンテーションのために視覚的アフォーダンスと呼ばれる視覚的な手がかりを理解/学習する能力を必要とします。
現在まで、視覚的なアフォーダンスの理解と学習のための公的に利用可能な大規模なデータセットはありません。
この論文では、大規模なマルチビュー RGBD ビジュアルアフォーダンス学習データセットを紹介します。これは、15 のビジュアルアフォーダンスカテゴリで注釈が付けられた、37 のオブジェクトカテゴリからの 47,210 RGBD 画像のベンチマークです。
私たちの知る限り、これは史上初かつ最大のマルチビュー RGBD ビジュアルアフォーダンス学習データセットです。
一般的なビジョントランスフォーマーと畳み込みニューラルネットワークを使用して、アフォーダンスセグメンテーションと認識タスク用に提案されたデータセットのベンチマークを行います。
いくつかの最先端の深層学習ネットワークが、それぞれアフォーダンス認識タスクとセグメンテーションタスクについて評価されています。
私たちの実験結果は、データセットの困難な性質を示し、新しく堅牢なアフォーダンス学習アルゴリズムの明確な見通しを示しています。
データセットは https://sites.google.com/view/afaqshah/dataset で公開されています。

要約(オリジナル)

The physical and textural attributes of objects have been widely studied for recognition, detection and segmentation tasks in computer vision.~A number of datasets, such as large scale ImageNet, have been proposed for feature learning using data hungry deep neural networks and for hand-crafted feature extraction. To intelligently interact with objects, robots and intelligent machines need the ability to infer beyond the traditional physical/textural attributes, and understand/learn visual cues, called visual affordances, for affordance recognition, detection and segmentation. To date there is no publicly available large dataset for visual affordance understanding and learning. In this paper, we introduce a large scale multi-view RGBD visual affordance learning dataset, a benchmark of 47210 RGBD images from 37 object categories, annotated with 15 visual affordance categories. To the best of our knowledge, this is the first ever and the largest multi-view RGBD visual affordance learning dataset. We benchmark the proposed dataset for affordance segmentation and recognition tasks using popular Vision Transformer and Convolutional Neural Networks. Several state-of-the-art deep learning networks are evaluated each for affordance recognition and segmentation tasks. Our experimental results showcase the challenging nature of the dataset and present definite prospects for new and robust affordance learning algorithms. The dataset is publicly available at https://sites.google.com/view/afaqshah/dataset.

arxiv情報

著者	Zeyad Khalifa,Syed Afaq Ali Shah
発行日	2023-07-05 13:48:43+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Towards Visual Affordance Learning: A Benchmark for Affordance Segmentation and Recognition

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー