MTevent: A Multi-Task Event Camera Dataset for 6D Pose Estimation and Moving Object Detection

要約

モバイルロボットは前例のない速度に達しており、Unitree B2やFraunhofer O3Dynなどのプラットフォームが5〜10 m/sの最大速度を達成しています。
ただし、このような速度を効果的に利用することは、RGBカメラの制限により、モーションブラーに悩まされ、リアルタイムの応答性を提供できないため、課題のままです。
非同期操作と低遅延センシングを備えたイベントカメラは、高速ロボット認識のための有望な代替手段を提供します。
この作業では、6Dポーズ推定と移動オブジェクト検出のために設計されたデータセットであるMteventを紹介します。
私たちのセットアップは、ステレオイベントカメラとRGBカメラで構成され、それぞれ平均16秒で75のシーンをキャプチャし、極端な視聴角、さまざまな照明、閉塞などの困難な条件下で16の一意のオブジェクトを備えています。
Mteventは、高速モーション、長距離知覚、および実際のオブジェクトの相互作用を組み合わせた最初のデータセットであり、ロボット工学のイベントベースのビジョンを進めるための貴重なリソースとなっています。
ベースラインを確立するために、RGB画像上のNVIDIAのFoundationPosesを使用して6Dポーズ推定のタスクを評価し、地面の真実マスクで0.22の平均リコールを達成し、このような動的な設定でのRGBベースのアプローチの制限を強調します。
Mteventを使用すると、知覚モデルを改善し、高速ロボットビジョンのさらなる研究を促進するための新しいリソースを提供します。
データセットは、https://huggingface.co/datasets/anas-gouda/mteventをダウンロードできます

要約(オリジナル)

Mobile robots are reaching unprecedented speeds, with platforms like Unitree B2, and Fraunhofer O3dyn achieving maximum speeds between 5 and 10 m/s. However, effectively utilizing such speeds remains a challenge due to the limitations of RGB cameras, which suffer from motion blur and fail to provide real-time responsiveness. Event cameras, with their asynchronous operation, and low-latency sensing, offer a promising alternative for high-speed robotic perception. In this work, we introduce MTevent, a dataset designed for 6D pose estimation and moving object detection in highly dynamic environments with large detection distances. Our setup consists of a stereo-event camera and an RGB camera, capturing 75 scenes, each on average 16 seconds, and featuring 16 unique objects under challenging conditions such as extreme viewing angles, varying lighting, and occlusions. MTevent is the first dataset to combine high-speed motion, long-range perception, and real-world object interactions, making it a valuable resource for advancing event-based vision in robotics. To establish a baseline, we evaluate the task of 6D pose estimation using NVIDIA’s FoundationPose on RGB images, achieving an Average Recall of 0.22 with ground-truth masks, highlighting the limitations of RGB-based approaches in such dynamic settings. With MTevent, we provide a novel resource to improve perception models and foster further research in high-speed robotic vision. The dataset is available for download https://huggingface.co/datasets/anas-gouda/MTevent

arxiv情報

著者	Shrutarv Awasthi,Anas Gouda,Sven Franke,Jérôme Rutinowski,Frank Hoffmann,Moritz Roidl
発行日	2025-05-16 14:18:21+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

MTevent: A Multi-Task Event Camera Dataset for 6D Pose Estimation and Moving Object Detection

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー