Dynamic Robot Tool Use with Vision Language Models

要約

道具の使用はロボットのタスク能力を向上させる。近年の視覚言語モデル（VLM）の進歩により、ロボットは工具使用アプリケーションのための洗練された認知能力を備えている。しかし、既存の方法論は、初歩的な準静的工具操作や高レベルの工具選択に焦点を当て、タスクに適した工具把持の重要な側面を無視している。この限界に対処するために、我々は、多目的なロボットの工具使用のためのきめ細かいプランニングを可能にする、新しいVLM駆動フレームワークである逆工具使用プランニング（iTUP）を紹介する。iTUPは、VLMに基づく工具と接触点の接地、位置-速度軌道計画、物理情報に基づく把持の生成と選択の統合されたパイプラインを通して、(1)準静的、(2)より困難な動的、(3)クラスタ工具使用タスクに渡る汎用性を示す。ロバストなプランニングを保証するために、我々のフレームワークは、意味的アフォーダンスと物理的制約を推論することにより、安定かつ安全なタスクを考慮した把持を統合している。iTUPとベースラインを、精密ハンマー打ち、物体すくい、クラスタ掃引を含む現実的な道具使用タスクの包括的な範囲で評価する。実験結果は、iTUPが、多様な環境にわたる困難なロボットの道具使用に対する認知と計画の徹底的な基礎付けを保証することを実証している。

要約(オリジナル)

Tool use enhances a robot’s task capabilities. Recent advances in vision-language models (VLMs) have equipped robots with sophisticated cognitive capabilities for tool-use applications. However, existing methodologies focus on elementary quasi-static tool manipulations or high-level tool selection while neglecting the critical aspect of task-appropriate tool grasping. To address this limitation, we introduce inverse Tool-Use Planning (iTUP), a novel VLM-driven framework that enables grounded fine-grained planning for versatile robotic tool use. Through an integrated pipeline of VLM-based tool and contact point grounding, position-velocity trajectory planning, and physics-informed grasp generation and selection, iTUP demonstrates versatility across (1) quasi-static and more challenging (2) dynamic and (3) cluster tool-use tasks. To ensure robust planning, our framework integrates stable and safe task-aware grasping by reasoning over semantic affordances and physical constraints. We evaluate iTUP and baselines on a comprehensive range of realistic tool use tasks including precision hammering, object scooping, and cluster sweeping. Experimental results demonstrate that iTUP ensures a thorough grounding of cognition and planning for challenging robot tool use across diverse environments.

arxiv情報

著者	Noah Trupin,Zixing Wang,Ahmed H. Qureshi
発行日	2025-05-02 17:20:46+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Dynamic Robot Tool Use with Vision Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー