Know Where You’re Uncertain When Planning with Multimodal Foundation Models: A Formal Framework

要約

マルチモーダルファンデーションモデルは、実行可能な計画を生成するために感覚入力を処理することにより、ロボットの知覚と計画のための有望なフレームワークを提供します。
ただし、知覚（感覚解釈）と意思決定（計画生成）の両方の不確実性に対処することは、タスクの信頼性を確保するための重要な課題のままです。
これらの2つの形式の不確実性を解きほぐし、定量化し、軽減するための包括的なフレームワークを提示します。
最初に、不確実性の解体の枠組みを紹介し、生成された計画の堅牢性に関連する視覚的理解と決定の不確実性の制限から生じる知覚の不確実性を分離します。
各タイプの不確実性を定量化するために、知覚と意思決定のユニークな特性に合わせた方法を提案します。コンフォーマル予測を使用して知覚の不確実性を調整し、正式なメソッド駆動型予測（FMDP）を導入して意思決定の不確実性を定量化し、理論保証の正式な検証技術を活用します。
この定量化に基づいて、2つの標的介入メカニズムを実装します。視覚的な入力品質を向上させるために高度な確実なシーンを動的に再観測するアクティブなセンシングプロセスと、モデルを高確認データで微調整し、タスク仕様を満たす機能を改善する自動改良手順です。
現実世界およびシミュレートされたロボットタスクの経験的検証は、私たちの不確実性の解体フレームワークが変動性を最大40％減らし、ベースラインと比較してタスクの成功率を5％増強することを示しています。
これらの改善は、両方の介入の複合効果と、自律システムの堅牢性と信頼性を高めるターゲットを絞った介入を促進する不確実性解体の重要性を強調しています。
微調整されたモデル、コード、およびデータセットは、https：//uncentaverty-in-planning.github.io/で入手できます。

要約(オリジナル)

Multimodal foundation models offer a promising framework for robotic perception and planning by processing sensory inputs to generate actionable plans. However, addressing uncertainty in both perception (sensory interpretation) and decision-making (plan generation) remains a critical challenge for ensuring task reliability. We present a comprehensive framework to disentangle, quantify, and mitigate these two forms of uncertainty. We first introduce a framework for uncertainty disentanglement, isolating perception uncertainty arising from limitations in visual understanding and decision uncertainty relating to the robustness of generated plans. To quantify each type of uncertainty, we propose methods tailored to the unique properties of perception and decision-making: we use conformal prediction to calibrate perception uncertainty and introduce Formal-Methods-Driven Prediction (FMDP) to quantify decision uncertainty, leveraging formal verification techniques for theoretical guarantees. Building on this quantification, we implement two targeted intervention mechanisms: an active sensing process that dynamically re-observes high-uncertainty scenes to enhance visual input quality and an automated refinement procedure that fine-tunes the model on high-certainty data, improving its capability to meet task specifications. Empirical validation in real-world and simulated robotic tasks demonstrates that our uncertainty disentanglement framework reduces variability by up to 40% and enhances task success rates by 5% compared to baselines. These improvements are attributed to the combined effect of both interventions and highlight the importance of uncertainty disentanglement, which facilitates targeted interventions that enhance the robustness and reliability of autonomous systems. Fine-tuned models, code, and datasets are available at https://uncertainty-in-planning.github.io/.

arxiv情報

著者	Neel P. Bhatt,Yunhao Yang,Rohan Siva,Daniel Milan,Ufuk Topcu,Zhangyang Wang
発行日	2025-04-17 02:45:08+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Know Where You’re Uncertain When Planning with Multimodal Foundation Models: A Formal Framework

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー