Vision-State Fusion: Improving Deep Neural Networks for Autonomous Robotics

要約

ビジョンベースの深層学習による知覚は、ロボット工学において最も重要な役割を果たし、自律型無人航空機 (UAV) のアクロバティックな操縦やロボット支援による高精度手術など、多くの困難なシナリオの解決を促進します。
ロボットの制御変数を直接出力する制御指向のエンドツーエンドの知覚アプローチは、一般にロボットの状態推定を補助入力として利用します。
中間出力が推定されて下位レベルのコントローラーに供給される場合、つまり媒介アプローチでは、ロボットの状態は通常、ロボット自体の物理的特性を推定する自己中心的なタスクの入力としてのみ使用されます。
この研究では、推定された出力が外部の主体を参照する非自己中心的媒介タスクに、私たちの知る限りで初めて同様のアプローチを適用することを提案します。
私たちの一般的な方法論が、最小限の計算コストで、広範なクラスの非自己中心的な 3D 姿勢推定問題に対するディープ畳み込みニューラルネットワーク (CNN) の回帰パフォーマンスをどのように向上させるかを証明します。
ロボットアームによる掴みからポケットサイズの UAV による人間の追跡に至る 3 つの非常に異なるユースケースを分析することにより、私たちの結果は、R\textsuperscript{2} 回帰指標を一貫して改善し、それらのユースケースと比較して最大 +0.51 を達成しました。
ステートレスなベースライン。
最後に、人間の姿勢推定タスクにおける閉ループ自律型 cm スケール UAV のフィールド内パフォーマンスを検証します。
私たちの結果は、最先端のステートレス CNN と比較して、ステートフル CNN の平均絶対誤差が大幅に減少していること、つまり平均で 24\% であることを示しています。

要約(オリジナル)

Vision-based deep learning perception fulfills a paramount role in robotics, facilitating solutions to many challenging scenarios, such as acrobatic maneuvers of autonomous unmanned aerial vehicles (UAVs) and robot-assisted high-precision surgery. Control-oriented end-to-end perception approaches, which directly output control variables for the robot, commonly take advantage of the robot’s state estimation as an auxiliary input. When intermediate outputs are estimated and fed to a lower-level controller, i.e. mediated approaches, the robot’s state is commonly used as an input only for egocentric tasks, which estimate physical properties of the robot itself. In this work, we propose to apply a similar approach for the first time — to the best of our knowledge — to non-egocentric mediated tasks, where the estimated outputs refer to an external subject. We prove how our general methodology improves the regression performance of deep convolutional neural networks (CNNs) on a broad class of non-egocentric 3D pose estimation problems, with minimal computational cost. By analyzing three highly-different use cases, spanning from grasping with a robotic arm to following a human subject with a pocket-sized UAV, our results consistently improve the R\textsuperscript{2} regression metric, up to +0.51, compared to their stateless baselines. Finally, we validate the in-field performance of a closed-loop autonomous cm-scale UAV on the human pose estimation task. Our results show a significant reduction, i.e., 24\% on average, on the mean absolute error of our stateful CNN, compared to a State-of-the-Art stateless counterpart.

arxiv情報

著者	Elia Cereda,Stefano Bonato,Mirko Nava,Alessandro Giusti,Daniele Palossi
発行日	2024-03-20 08:41:49+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Vision-State Fusion: Improving Deep Neural Networks for Autonomous Robotics

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー