jarxiv | Japanese arxiv | ページ 1283

A bio-inspired sand-rolling robot: effect of body shape on sand rolling performance

投稿日: 2025年3月19日作成者: jarxiv

要約

砂や砂利などの複雑な地形を効果的に移動する能力は、ロボットが屋外環境で堅牢に動作できるようにし、環境監視、検索とレスキュー、供給配信などの重要なタスクを支援することができます。
ライエル・サンショウウオ山の体をループにカールし、効果的に{\リビジョンの丘の斜面}をロールダウンする能力に触発されました。この研究では、サンドローリングロボットを開発し、その運動性能がその体の形状によってどのように支配されるかを調査します。
六角形、四辺形、三角形の3つの異なるボディーシップを実験的にテストしました。
六角形と三角形は、砂のより速いローリング速度を達成できることがわかりましたが、行き詰まるという頻繁な失敗を示すことがわかりました。
ロボットと砂の間の相互作用の分析により、故障メカニズムが明らかになりました。砂の変形により、ロボット接触セグメントの下に局所的な「砂の傾斜」が生成され、ポリゴン（ERSP）を支持する有効な領域が増加し、ロボットがAERSPの外側の中心（COM）をシフトして持続可能なローリングを生成します。
このメカニズムに基づいて、高度に単純化されたモデルは、各ローリング形状の臨界ボディピッチを正常にキャプチャして、砂の持続的なローリングを生成し、運動障害を軽減し、ロボット速度を200ドル以上の改善した情報に基づいた設計適応を生み出しました。
私たちの結果は、機関車がさまざまな形態学的特徴を利用して、変形可能な基質全体で堅牢なローリング運動を達成する方法についての洞察を提供します。

要約(オリジナル)

The capability of effectively moving on complex terrains such as sand and gravel can empower our robots to robustly operate in outdoor environments, and assist with critical tasks such as environment monitoring, search-and-rescue, and supply delivery. Inspired by the Mount Lyell salamander’s ability to curl its body into a loop and effectively roll down {\Revision hill slopes}, in this study we develop a sand-rolling robot and investigate how its locomotion performance is governed by the shape of its body. We experimentally tested three different body shapes: Hexagon, Quadrilateral, and Triangle. We found that Hexagon and Triangle can achieve a faster rolling speed on sand, but exhibited more frequent failures of getting stuck. Analysis of the interaction between robot and sand revealed the failure mechanism: the deformation of the sand produced a local “sand incline” underneath robot contact segments, increasing the effective region of supporting polygon (ERSP) and preventing the robot from shifting its center of mass (CoM) outside the ERSP to produce sustainable rolling. Based on this mechanism, a highly-simplified model successfully captured the critical body pitch for each rolling shape to produce sustained rolling on sand, and informed design adaptations that mitigated the locomotion failures and improved robot speed by more than 200$\%$. Our results provide insights into how locomotors can utilize different morphological features to achieve robust rolling motion across deformable substrates.

arxiv情報

著者	Xingjue Liao,Wenhao Liu,Hao Wu,Feifei Qian
発行日	2025-03-18 05:31:56+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.RO | コメントを受け付けていません

Is Linear Feedback on Smoothed Dynamics Sufficient for Stabilizing Contact-Rich Plans?

投稿日: 2025年3月19日作成者: jarxiv

要約

コンタクトが豊富な操作のためのプランナーとコントローラーの設計は、多くの勾配ベースのコントローラー合成ツールが想定する滑らかさ条件に違反するため、非常に困難です。
接触スムージングは、スムーズなシステムで非滑らかなシステムを近似し、これらの合成ツールをより効果的に使用できるようにします。
ただし、古典的な制御合成方法を適用して滑らかに接触するダイナミクスを適用すると、比較的標識がありません。
このペーパーでは、接触平滑化に基づいた微分シミュレーターを使用して、線形コントローラー合成の有効性を分析します。
（a）不確実な条件やダイナミクスに対して堅牢なオープンループ計画、および（b）オープンループプランの周りで安定化するフィードバックの利益を計算するために、接触のスムージングを活用するための自然なベースラインを導入します。
テストベッドとしてロボットの両体全身操作を使用して、300を超える軌道で広範な実験的実験を行い、LQRが接触豊富なプランを安定化するには不十分であると思われる理由を分析します。
このペーパーとハードウェアの実験を要約するビデオは、https：//youtu.be/hlaki6qbwqg?si=_zcambbd6rgsitm9にあります。

要約(オリジナル)

Designing planners and controllers for contact-rich manipulation is extremely challenging as contact violates the smoothness conditions that many gradient-based controller synthesis tools assume. Contact smoothing approximates a non-smooth system with a smooth one, allowing one to use these synthesis tools more effectively. However, applying classical control synthesis methods to smoothed contact dynamics remains relatively under-explored. This paper analyzes the efficacy of linear controller synthesis using differential simulators based on contact smoothing. We introduce natural baselines for leveraging contact smoothing to compute (a) open-loop plans robust to uncertain conditions and/or dynamics, and (b) feedback gains to stabilize around open-loop plans. Using robotic bimanual whole-body manipulation as a testbed, we perform extensive empirical experiments on over 300 trajectories and analyze why LQR seems insufficient for stabilizing contact-rich plans. The video summarizing this paper and hardware experiments is found here: https://youtu.be/HLaKi6qbwQg?si=_zCAmBBD6rGSitm9.

arxiv情報

著者	Yuki Shirai,Tong Zhao,H. J. Terry Suh,Huaijiang Zhu,Xinpei Ni,Jiuguang Wang,Max Simchowitz,Tao Pang
発行日	2025-03-18 05:32:44+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.RO, cs.SY, eess.SY | コメントを受け付けていません

COLSON: Controllable Learning-Based Social Navigation via Diffusion-Based Reinforcement Learning

投稿日: 2025年3月19日作成者: jarxiv

要約

歩行者の交通を伴う動的環境でのモバイルロボットナビゲーションは、自律モバイルサービスロボットの開発における重要な課題です。
最近、深い強化学習ベースの方法が積極的に研究されており、最適化能力により、従来のルールベースのアプローチよりも優れています。
これらのうち、連続的なアクション空間を仮定する方法は、通常、ガウス分布の仮定に依存しており、生成されたアクションの柔軟性を制限します。
一方、拡散モデルの強化学習への適用は進歩しており、ガウス分布ベースのアプローチと比較して、より柔軟なアクション分布が可能になりました。
この研究では、拡散ベースの強化学習アプローチを社会航海に適用し、その有効性を検証しました。
さらに、拡散モデルの特性を活用することにより、トレーニング後のアクションのスムージングと適応を可能にする拡張機能を提案します。

要約(オリジナル)

Mobile robot navigation in dynamic environments with pedestrian traffic is a key challenge in the development of autonomous mobile service robots. Recently, deep reinforcement learning-based methods have been actively studied and have outperformed traditional rule-based approaches owing to their optimization capabilities. Among these, methods that assume a continuous action space typically rely on a Gaussian distribution assumption, which limits the flexibility of generated actions. Meanwhile, the application of diffusion models to reinforcement learning has advanced, allowing for more flexible action distributions compared with Gaussian distribution-based approaches. In this study, we applied a diffusion-based reinforcement learning approach to social navigation and validated its effectiveness. Furthermore, by leveraging the characteristics of diffusion models, we propose an extension that enables post-training action smoothing and adaptation to static obstacle scenarios not considered during the training steps.

arxiv情報

著者	Yuki Tomita,Kohei Matsumoto,Yuki Hyodo,Ryo Kurazume
発行日	2025-03-18 06:02:30+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.RO | コメントを受け付けていません

SE(3)-Equivariant Robot Learning and Control: A Tutorial Survey

投稿日: 2025年3月19日作成者: jarxiv

要約

ディープラーニングとトランスの最近の進歩により、模倣学習、強化学習、LLMベースのマルチモーダル認識や意思決定などの技術を採用することにより、ロボット工学の大きなブレークスルーを促進しました。
ただし、従来のディープラーニングモデルとトランスモデルは、通常、大規模なデータセットまたは広範なデータ増強に依存して、固有の対称性とinvarianceでデータを処理するのに苦労しています。
等量性ニューラルネットワークは、これらの制限を克服し、対称性と不変性をアーキテクチャに明示的に統合し、効率と一般化の改善につながります。
このチュートリアル調査では、視覚的ロボット操作と制御デザインにおける天然の3D回転および翻訳の対称性を活用するSE（3）エキバリントモデルに焦点を当てた、クラシックから最先端まで、ロボット工学の幅広い等しい深い学習と制御方法をレビューします。
統一された数学表記を使用して、マトリックスの嘘グループと嘘代数とともに、グループ理論からの重要な概念をレビューすることから始めます。
次に、基本的なグループエクイバリアントニューラルネットワークの設計を導入し、その構造を通じてグループエキバリンスをどのように取得できるかを示します。
次に、模倣学習と強化学習の観点から、ロボット工学におけるSE（3）equivariant Neural Networkの応用について説明します。
SE（3） – equivariant Control Designは、幾何学的制御の観点からもレビューされています。
最後に、より堅牢でサンプル効率の良い、マルチモーダルの実世界のロボットシステムを開発する際の等縁道の課題と将来の方向を強調します。

要約(オリジナル)

Recent advances in deep learning and Transformers have driven major breakthroughs in robotics by employing techniques such as imitation learning, reinforcement learning, and LLM-based multimodal perception and decision-making. However, conventional deep learning and Transformer models often struggle to process data with inherent symmetries and invariances, typically relying on large datasets or extensive data augmentation. Equivariant neural networks overcome these limitations by explicitly integrating symmetry and invariance into their architectures, leading to improved efficiency and generalization. This tutorial survey reviews a wide range of equivariant deep learning and control methods for robotics, from classic to state-of-the-art, with a focus on SE(3)-equivariant models that leverage the natural 3D rotational and translational symmetries in visual robotic manipulation and control design. Using unified mathematical notation, we begin by reviewing key concepts from group theory, along with matrix Lie groups and Lie algebras. We then introduce foundational group-equivariant neural network design and show how the group-equivariance can be obtained through their structure. Next, we discuss the applications of SE(3)-equivariant neural networks in robotics in terms of imitation learning and reinforcement learning. The SE(3)-equivariant control design is also reviewed from the perspective of geometric control. Finally, we highlight the challenges and future directions of equivariant methods in developing more robust, sample-efficient, and multi-modal real-world robotic systems.

arxiv情報

著者	Joohwan Seo,Soochul Yoo,Junwoo Chang,Hyunseok An,Hyunwoo Ryu,Soomi Lee,Arvind Kruthiventy,Jongeun Choi,Roberto Horowitz
発行日	2025-03-18 06:26:34+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.LG, cs.RO, cs.SY, eess.SY | コメントを受け付けていません

FlexVLN: Flexible Adaptation for Diverse Vision-and-Language Navigation Tasks

投稿日: 2025年3月19日作成者: jarxiv

要約

Vision and-Language Navigation（VLN）タスクの願望は、さまざまなタスクにわたってナビゲーション機能をシームレスに転送できる堅牢な適応性を備えた具体化されたエージェントを開発することでした。
近年の驚くべき進歩にもかかわらず、ほとんどの方法ではデータセット固有のトレーニングが必要であり、それにより、異なるタイプの命令を含む多様なデータセット全体に一般化する機能が欠けています。
大規模な言語モデル（LLMS）は、ロボットアクションプランニングで大きな可能性を示しており、例外的な推論と一般化能力を実証しています。
このホワイトペーパーでは、VLNへの革新的な階層的アプローチであるFlexVLNを提案します。これは、監督者の学習ベースの指導フォロワーの基本的なナビゲーション能力とLLMプランナーの堅牢な一般化能力を統合し、多様なVLNデータセット全体で効果的な一般化を可能にします。
さらに、LLMプランナーによる潜在的な幻覚を軽減し、命令フォロワーの実行精度を強化するために、検証メカニズムとマルチモデル統合メカニズムが提案されています。
一般化能力を評価するために、すぐに、すぐにdomainのデータセットとしてCVDNターゲットを採用します。
FlexVLNの一般化パフォーマンスは、以前のすべての方法のパフォーマンスを大幅に超えています。

要約(オリジナル)

The aspiration of the Vision-and-Language Navigation (VLN) task has long been to develop an embodied agent with robust adaptability, capable of seamlessly transferring its navigation capabilities across various tasks. Despite remarkable advancements in recent years, most methods necessitate dataset-specific training, thereby lacking the capability to generalize across diverse datasets encompassing distinct types of instructions. Large language models (LLMs) have demonstrated exceptional reasoning and generalization abilities, exhibiting immense potential in robot action planning. In this paper, we propose FlexVLN, an innovative hierarchical approach to VLN that integrates the fundamental navigation ability of a supervised-learning-based Instruction Follower with the robust generalization ability of the LLM Planner, enabling effective generalization across diverse VLN datasets. Moreover, a verification mechanism and a multi-model integration mechanism are proposed to mitigate potential hallucinations by the LLM Planner and enhance execution accuracy of the Instruction Follower. We take REVERIE, SOON, and CVDN-target as out-of-domain datasets for assessing generalization ability. The generalization performance of FlexVLN surpasses that of all the previous methods to a large extent.

arxiv情報

著者	Siqi Zhang,Yanyuan Qiao,Qunbo Wang,Longteng Guo,Zhihua Wei,Jing Liu
発行日	2025-03-18 06:58:41+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

An Real-Sim-Real (RSR) Loop Framework for Generalizable Robotic Policy Transfer with Differentiable Simulation

投稿日: 2025年3月19日作成者: jarxiv

要約

SIMとリアルのギャップは、ロボット工学における重要な課題のままであり、実際のシステムへのシミュレーションでトレーニングされたアルゴリズムの展開を妨げています。
このペーパーでは、シミュレーションパラメーターを反復的に改良し、実際の条件に合わせ、堅牢で効率的なポリシー転送を可能にすることにより、このギャップに対処するための微分可能なシミュレーションを活用する新しいリアルシムリアル（RSR）ループフレームワークを紹介します。
私たちの仕事の重要な貢献は、多様で代表的な現実世界のデータの収集を促進し、バイアスを最小限に抑え、シミュレーションの改良のための各データポイントの有用性を最大化する有益なコスト関数の設計です。
このコスト関数は、既存の強化学習アルゴリズム（PPO、SACなど）にシームレスに統合され、実際のドメインの重要な領域のバランスの取れた探索が保証されます。
さらに、当社のアプローチは多用途のMujoco MJXプラットフォームに実装されており、フレームワークは幅広いロボットシステムと互換性があります。
いくつかのロボット操作タスクの実験結果は、私たちの方法がSIMからリアルのギャップを大幅に削減し、明示的および暗黙的な環境不確実性の両方の多様なシナリオ全体で高いタスクパフォーマンスと一般化可能性を達成することを示しています。

要約(オリジナル)

The sim-to-real gap remains a critical challenge in robotics, hindering the deployment of algorithms trained in simulation to real-world systems. This paper introduces a novel Real-Sim-Real (RSR) loop framework leveraging differentiable simulation to address this gap by iteratively refining simulation parameters, aligning them with real-world conditions, and enabling robust and efficient policy transfer. A key contribution of our work is the design of an informative cost function that encourages the collection of diverse and representative real-world data, minimizing bias and maximizing the utility of each data point for simulation refinement. This cost function integrates seamlessly into existing reinforcement learning algorithms (e.g., PPO, SAC) and ensures a balanced exploration of critical regions in the real domain. Furthermore, our approach is implemented on the versatile Mujoco MJX platform, and our framework is compatible with a wide range of robotic systems. Experimental results on several robotic manipulation tasks demonstrate that our method significantly reduces the sim-to-real gap, achieving high task performance and generalizability across diverse scenarios of both explicit and implicit environmental uncertainties.

arxiv情報

著者	Lu Shi,Yuxuan Xu,Shiyu Wang,Jinhao Huang,Wenhao Zhao,Yufei Jia,Zike Yan,Weibin Gu,Guyue Zhou
発行日	2025-03-18 07:28:11+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.LG, cs.RO | コメントを受け付けていません

SLC$^2$-SLAM: Semantic-guided Loop Closure using Shared Latent Code for NeRF SLAM

投稿日: 2025年3月19日作成者: jarxiv

要約

NERFスラムの悪名高い累積ドリフトエラーをターゲットにして、SLC $^2 $ -SLAMと呼ばれる共有潜在コードを使用してセマンティックガイド付きループ閉鎖を提案します。
多くのNERFスラムシステムに保存されている潜在コードは、より良い再建のためにのみ使用されるため、完全に活用されていないと主張します。
この論文では、ローカル機能と同じ潜在コードを使用して潜在的なループを検出するシンプルで効果的な方法を提案します。
ループ検出パフォーマンスをさらに向上させるために、セマンティック情報を使用します。セマンティック情報は、同じ潜在コードからデコードされて、ローカル機能の集約を導きます。
最後に、潜在的なループが検出された場合、グラフの最適化でそれらを閉じた後、バンドル調整を行い、推定されたポーズと再構築されたシーンの両方を改良します。
SLC $^2 $ -SLAMのパフォーマンスを評価するために、レプリカおよびスキャンテットデータセットで広範な実験を実施します。
提案されたセマンティックガイド付きループ閉鎖は、事前に訓練されたNetVladとOrbと組み合わせたワードバッグを大幅に上回ります。
その結果、SLC $^2 $ -SLAMは、特にScannetのようなより多くのループを備えたより大きなシーンで、より良い追跡と再構成のパフォーマンスも実証しました。

要約(オリジナル)

Targeting the notorious cumulative drift errors in NeRF SLAM, we propose a Semantic-guided Loop Closure using Shared Latent Code, dubbed SLC$^2$-SLAM. We argue that latent codes stored in many NeRF SLAM systems are not fully exploited, as they are only used for better reconstruction. In this paper, we propose a simple yet effective way to detect potential loops using the same latent codes as local features. To further improve the loop detection performance, we use the semantic information, which are also decoded from the same latent codes to guide the aggregation of local features. Finally, with the potential loops detected, we close them with a graph optimization followed by bundle adjustment to refine both the estimated poses and the reconstructed scene. To evaluate the performance of our SLC$^2$-SLAM, we conduct extensive experiments on Replica and ScanNet datasets. Our proposed semantic-guided loop closure significantly outperforms the pre-trained NetVLAD and ORB combined with Bag-of-Words, which are used in all the other NeRF SLAM with loop closure. As a result, our SLC$^2$-SLAM also demonstrated better tracking and reconstruction performance, especially in larger scenes with more loops, like ScanNet.

arxiv情報

著者	Yuhang Ming,Di Ma,Weichen Dai,Han Yang,Rui Fan,Guofeng Zhang,Wanzeng Kong
発行日	2025-03-18 07:31:25+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.RO | コメントを受け付けていません

Robust Safety Critical Control Under Multiple State and Input Constraints: Volume Control Barrier Function Method

投稿日: 2025年3月19日作成者: jarxiv

要約

この論文では、複数の制御バリア関数（CBF）の制約と入力制約の下での不確実なシステムの安全性が批判的な制御問題を調査します。
安全性リスクが発生したときに参照入力の変更を最小限に抑えて、安全性とパフォーマンスのバランスを確保する安全フィルターを生成するために、新しいフレームワークが提案されています。
エラーの符号（Rise）の堅牢な積分に基づく非線形妨害オブザーバー（DOB）は、システムの不確実性を推定するために使用され、推定誤差が指数関数的にゼロに収束することを保証します。
このエラーバウンドは、安全性の高いコントローラーに統合され、安全性を確保しながら保守性を低下させます。
複数のCBFと入力制約から生じる課題にさらに対処するために、2次プログラミング（QP）問題の実行可能なスペースを分析することにより、新しいボリュームCBF（VCBF）が提案されています。
％ボリュームを正の値として保持することにより、ソリューションの実現可能性を確保します。
実行可能なスペースが妨害下で消滅しないようにするために、DOB-VCBFベースの方法が導入され、結果のQPの実現可能性を維持しながらシステムの安全性が保証されます。
その後、提案されたコントローラーの有効性を検証するために、シミュレーションと実験結果のいくつかのグループが提供されます。

要約(オリジナル)

In this paper, the safety-critical control problem for uncertain systems under multiple control barrier function (CBF) constraints and input constraints is investigated. A novel framework is proposed to generate a safety filter that minimizes changes to reference inputs when safety risks arise, ensuring a balance between safety and performance. A nonlinear disturbance observer (DOB) based on the robust integral of the sign of the error (RISE) is used to estimate system uncertainties, ensuring that the estimation error converges to zero exponentially. This error bound is integrated into the safety-critical controller to reduce conservativeness while ensuring safety. To further address the challenges arising from multiple CBF and input constraints, a novel Volume CBF (VCBF) is proposed by analyzing the feasible space of the quadratic programming (QP) problem. % ensuring solution feasibility by keeping the volume as a positive value. To ensure that the feasible space does not vanish under disturbances, a DOB-VCBF-based method is introduced, ensuring system safety while maintaining the feasibility of the resulting QP. Subsequently, several groups of simulation and experimental results are provided to validate the effectiveness of the proposed controller.

arxiv情報

著者	Jinyang Dong,Shizhen Wu,Rui Liu,Xiao Liang,Biao Lu,Yongchun Fang
発行日	2025-03-18 07:58:58+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.RO, cs.SY, eess.SY | コメントを受け付けていません

Foundation Feature-Driven Online End-Effector Pose Estimation: A Marker-Free and Learning-Free Approach

投稿日: 2025年3月19日作成者: jarxiv

要約

カメラスペースとロボットスペースの間の正確な変換の推定が不可欠です。
手と目のキャリブレーションにマーカーを使用した従来の方法では、オフラインの画像収集が必要であり、オンラインの自己キャリブレーションに対する適合性を制限します。
最近の学習ベースのロボットは、オンラインキャリブレーションを進めながら、クロスロボットの一般化に苦労し、ロボットを完全に見えるようにする必要があります。
この作業は、基礎特徴駆動型のオンラインエンドエフェクターポーズ推定（FEEPE）アルゴリズムを提案します。
Feepeは、Foundationモデルのゼロショット一般化機能に触発され、PNPアルゴリズムを介した6Dポーズ推定を可能にするために、PNPモデルとターゲット画像から派生した2D-3D対応を推定するために、訓練を受けた視覚的特徴をレバレッジします。
部分的な観測と対称性からあいまいさを解決するために、多歴史的なキーフレーム強化ポーズ最適化アルゴリズムが導入され、精度が向上するために時間情報を利用します。
従来のハンドアイキャリブレーションと比較して、FEEPEはマーカーフリーのオンラインキャリブレーションを可能にします。
ロボットの推定とは異なり、ロボットとエンドエフェクター全体でトレーニングなしの方法で一般化します。
広範な実験は、その優れた柔軟性、一般化、およびパフォーマンスを実証しています。

要約(オリジナル)

Accurate transformation estimation between camera space and robot space is essential. Traditional methods using markers for hand-eye calibration require offline image collection, limiting their suitability for online self-calibration. Recent learning-based robot pose estimation methods, while advancing online calibration, struggle with cross-robot generalization and require the robot to be fully visible. This work proposes a Foundation feature-driven online End-Effector Pose Estimation (FEEPE) algorithm, characterized by its training-free and cross end-effector generalization capabilities. Inspired by the zero-shot generalization capabilities of foundation models, FEEPE leverages pre-trained visual features to estimate 2D-3D correspondences derived from the CAD model and target image, enabling 6D pose estimation via the PnP algorithm. To resolve ambiguities from partial observations and symmetry, a multi-historical key frame enhanced pose optimization algorithm is introduced, utilizing temporal information for improved accuracy. Compared to traditional hand-eye calibration, FEEPE enables marker-free online calibration. Unlike robot pose estimation, it generalizes across robots and end-effectors in a training-free manner. Extensive experiments demonstrate its superior flexibility, generalization, and performance.

arxiv情報

著者	Tianshu Wu,Jiyao Zhang,Shiqian Liang,Zhengxiao Han,Hao Dong
発行日	2025-03-18 09:12:49+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

UniGoal: Towards Universal Zero-shot Goal-oriented Navigation

投稿日: 2025年3月19日作成者: jarxiv

要約

この論文では、ユニバーサルゼロショットの目標指向ナビゲーションの一般的なフレームワークを提案します。
既存のゼロショットメソッドは、特定のタスクの大きな言語モデル（LLM）に推論フレームワークを構築します。これは、全体的なパイプラインで大きく異なり、さまざまなタイプの目標にわたって一般化できません。
ユニバーサルゼロショットナビゲーションの目的に向けて、オブジェクトカテゴリ、インスタンス画像、テキストの説明など、さまざまな目標を統合するための均一なグラフ表現を提案します。
また、エージェントの観測をオンラインで維持されたシーングラフに変換します。
この一貫したシーンと目標表現により、純粋なテキストと比較してほとんどの構造情報を保存し、明示的なグラフベースの推論のためにLLMを活用することができます。
具体的には、シーングラフとゴールグラフの間で各時間瞬間にマッチングを行い、異なる戦略を提案して、さまざまな一致状態に従って探査の長期目標を生成します。
エージェントは、最初にゼロマッチングが行われたときにゴールのサブグラフを繰り返し検索します。
部分的なマッチングで、エージェントは座標投影とアンカーペアのアライメントを使用して、目標の位置を推測します。
最後に、シーングラフの修正と目標検証が完全にマッチするように適用されます。
また、ステージ間の堅牢なスイッチを有効にするためのブラックリストメカニズムも提示します。
いくつかのベンチマークでの広範な実験は、私たちのユニゴールが、単一のモデル、さらにはタスク固有のゼロショットメソッドを上回るだけでなく、監視されたユニバーサル方法を上回る3つの研究されたナビゲーションタスクで最先端のゼロショットパフォーマンスを達成することを示しています。

要約(オリジナル)

In this paper, we propose a general framework for universal zero-shot goal-oriented navigation. Existing zero-shot methods build inference framework upon large language models (LLM) for specific tasks, which differs a lot in overall pipeline and fails to generalize across different types of goal. Towards the aim of universal zero-shot navigation, we propose a uniform graph representation to unify different goals, including object category, instance image and text description. We also convert the observation of agent into an online maintained scene graph. With this consistent scene and goal representation, we preserve most structural information compared with pure text and are able to leverage LLM for explicit graph-based reasoning. Specifically, we conduct graph matching between the scene graph and goal graph at each time instant and propose different strategies to generate long-term goal of exploration according to different matching states. The agent first iteratively searches subgraph of goal when zero-matched. With partial matching, the agent then utilizes coordinate projection and anchor pair alignment to infer the goal location. Finally scene graph correction and goal verification are applied for perfect matching. We also present a blacklist mechanism to enable robust switch between stages. Extensive experiments on several benchmarks show that our UniGoal achieves state-of-the-art zero-shot performance on three studied navigation tasks with a single model, even outperforming task-specific zero-shot methods and supervised universal methods.

arxiv情報

著者	Hang Yin,Xiuwei Xu,Lingqing Zhao,Ziwei Wang,Jie Zhou,Jiwen Lu
発行日	2025-03-18 10:07:07+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント