BFA: Best-Feature-Aware Fusion for Multi-View Fine-grained Manipulation

要約

実際のシナリオでは、通常、マルチビューカメラが微調整された操作タスクに採用されています。
既存のアプローチ（ACTなど）は、マルチビュー機能を等しく扱い、ポリシー学習のためにそれらを直接連結する傾向があります。
ただし、冗長な視覚情報を導入し、より高い計算コストをもたらし、効果のない操作につながります。
きめ細かい操作タスクの場合、複数の段階を巻き込む傾向がありますが、さまざまな段階の最も寄与されているビューは時間とともに変化します。
このホワイトペーパーでは、さまざまなポリシーに適応できるマルチビュー操作タスクのプラグアンドプレイベストフィーチャーアウェア（BFA）融合戦略を提案します。
ポリシーネットワークの視覚的なバックボーンに基づいて構築され、各ビューの重要性スコアを予測するための軽量ネットワークを設計します。
予測された重要性スコアに基づいて、再航行されたマルチビュー機能はその後融合され、エンドツーエンドポリシーネットワークに入力され、シームレスな統合が可能になります。
特に、私たちの方法は、きめ細かい操作における優れたパフォーマンスを示しています。
実験結果は、私たちのアプローチが異なるタスクで22〜46％の成功率よりも複数のベースラインを上回ることを示しています。
私たちの作品は、細かい操作における重要な課題に取り組むための新しい洞察とインスピレーションを提供します。

要約(オリジナル)

In real-world scenarios, multi-view cameras are typically employed for fine-grained manipulation tasks. Existing approaches (e.g., ACT) tend to treat multi-view features equally and directly concatenate them for policy learning. However, it will introduce redundant visual information and bring higher computational costs, leading to ineffective manipulation. For a fine-grained manipulation task, it tends to involve multiple stages while the most contributed view for different stages is varied over time. In this paper, we propose a plug-and-play best-feature-aware (BFA) fusion strategy for multi-view manipulation tasks, which is adaptable to various policies. Built upon the visual backbone of the policy network, we design a lightweight network to predict the importance score of each view. Based on the predicted importance scores, the reweighted multi-view features are subsequently fused and input into the end-to-end policy network, enabling seamless integration. Notably, our method demonstrates outstanding performance in fine-grained manipulations. Experimental results show that our approach outperforms multiple baselines by 22-46% success rate on different tasks. Our work provides new insights and inspiration for tackling key challenges in fine-grained manipulations.

arxiv情報

著者	Zihan Lan,Weixin Mao,Haosheng Li,Le Wang,Tiancai Wang,Haoqiang Fan,Osamu Yoshie
発行日	2025-02-16 15:26:21+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

BFA: Best-Feature-Aware Fusion for Multi-View Fine-grained Manipulation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー