SplatSim: Zero-Shot Sim2Real Transfer of RGB Manipulation Policies Using Gaussian Splatting


Sim2Real の転送、特に RGB 画像に依存する操作ポリシーの場合、合成視覚データと現実世界の視覚データの間の大幅なドメインの変化により、ロボット工学における重要な課題が依然として残ります。
この論文では、RGB ベースの操作ポリシーの Sim2Real ギャップを削減するための主要なレンダリング プリミティブとしてガウス スプラッティングを利用する新しいフレームワークである SplatSim を提案します。
SplatSim は、シミュレータで従来のメッシュ表現をガウス スプラットに置き換えることにより、シミュレーションのスケーラビリティとコスト効率を維持しながら、非常にフォトリアリスティックな合成データを生成します。
SplatSim 内で操作ポリシーをトレーニングし、それらをゼロショット方式で実世界に展開することでフレームワークの有効性を実証し、平均成功率 86.25% を達成しました (実世界のデータでトレーニングされたポリシーの場合は 97.5%)。
ビデオはプロジェクト ページでご覧いただけます: https://splatsim.github.io


Sim2Real transfer, particularly for manipulation policies relying on RGB images, remains a critical challenge in robotics due to the significant domain shift between synthetic and real-world visual data. In this paper, we propose SplatSim, a novel framework that leverages Gaussian Splatting as the primary rendering primitive to reduce the Sim2Real gap for RGB-based manipulation policies. By replacing traditional mesh representations with Gaussian Splats in simulators, SplatSim produces highly photorealistic synthetic data while maintaining the scalability and cost-efficiency of simulation. We demonstrate the effectiveness of our framework by training manipulation policies within SplatSim and deploying them in the real world in a zero-shot manner, achieving an average success rate of 86.25%, compared to 97.5% for policies trained on real-world data. Videos can be found on our project page: https://splatsim.github.io


著者 Mohammad Nomaan Qureshi,Sparsh Garg,Francisco Yandun,David Held,George Kantor,Abhisesh Silwal
発行日 2024-10-07 03:37:36+00:00
A Framework for Guided Motion Planning


この研究では、ガイド スペースの概念を定義することによって、ガイド付き検索の直感的な概念を形式化します。
ガイダンスの言語と評価により、既存の方法の改善が示唆され、複数のソースからのガイダンスを組み合わせたシンプルなハイブリッド アルゴリズムが可能になります。


Randomized sampling based algorithms are widely used in robot motion planning due to the problem’s intractability, and are experimentally effective on a wide range of problem instances. Most variants bias their sampling using various heuristics related to the known underlying structure of the search space. In this work, we formalize the intuitive notion of guided search by defining the concept of a guiding space. This new language encapsulates many seemingly distinct prior methods under the same framework, and allows us to reason about guidance, a previously obscured core contribution of different algorithms. We suggest an information theoretic method to evaluate guidance, which experimentally matches intuition when tested on known algorithms in a variety of environments. The language and evaluation of guidance suggests improvements to existing methods, and allows for simple hybrid algorithms that combine guidance from multiple sources.


著者 Amnon Attali,Stav Ashur,Isaac Burton Love,Courtney McBeth,James Motes,Marco Morales,Nancy M. Amato
発行日 2024-10-07 03:56:10+00:00
Data-driven Diffusion Models for Enhancing Safety in Autonomous Vehicle Traffic Simulations


クリティカル シナリオ生成における最近の進歩により、有効性と現実性の点で、拡散ベースのアプローチが従来の生成モデルよりも優れていることが実証されています。


Safety-critical traffic scenarios are integral to the development and validation of autonomous driving systems. These scenarios provide crucial insights into vehicle responses under high-risk conditions rarely encountered in real-world settings. Recent advancements in critical scenario generation have demonstrated the superiority of diffusion-based approaches over traditional generative models in terms of effectiveness and realism. However, current diffusion-based methods fail to adequately address the complexity of driver behavior and traffic density information, both of which significantly influence driver decision-making processes. In this work, we present a novel approach to overcome these limitations by introducing adversarial guidance functions for diffusion models that incorporate behavior complexity and traffic density, thereby enhancing the generation of more effective and realistic safety-critical traffic scenarios. The proposed method is evaluated on two evaluation metrics: effectiveness and realism.The proposed method is evaluated on two evaluation metrics: effectiveness and realism, demonstrating better efficacy as compared to other state-of-the-art methods.


著者 Jinxiong Lu,Shoaib Azam,Gokhan Alcan,Ville Kyrki
発行日 2024-10-07 07:42:59+00:00
Centroidal State Estimation based on the Koopman Embedding for Dynamic Legged Locomotion


さまざまな動的歩行を実行する四足ロボットでの広範なシミュレーション実験を通じて、当社のデータ駆動型フレームワークは、非線形ダイナミクスに基づく従来の拡張カルマン フィルター技術を上回る性能を発揮します。
重要なのは、2 つの移動パターン (トロットとジャンプ) でトレーニングされた動的モード分解に基づくモデルは、再トレーニングすることなく、異なる動き (バウンド) の重心状態を正常に推定できることです。


In this paper, we introduce a novel approach to centroidal state estimation, which plays a crucial role in predictive model-based control strategies for dynamic legged locomotion. Our approach uses the Koopman operator theory to transform the robot’s complex nonlinear dynamics into a linear system, by employing dynamic mode decomposition and deep learning for model construction. We evaluate both models on their linearization accuracy and capability to capture both fast and slow dynamic system responses. We then select the most suitable model for estimation purposes, and integrate it within a moving horizon estimator. This estimator is formulated as a convex quadratic program to facilitate robust, real-time centroidal state estimation. Through extensive simulation experiments on a quadruped robot executing various dynamic gaits, our data-driven framework outperforms conventional Extended Kalman Filtering technique based on nonlinear dynamics. Our estimator addresses challenges posed by force/torque measurement noise in highly dynamic motions and accurately recovers the centroidal states, demonstrating the adaptability and effectiveness of the Koopman-based linear representation for complex locomotive behaviors. Importantly, our model based on dynamic mode decomposition, trained with two locomotion patterns (trot and jump), successfully estimates the centroidal states for a different motion (bound) without retraining.


著者 Shahram Khorshidi,Murad Dawood,Maren Bennewitz
発行日 2024-10-07 08:05:58+00:00
Safe Multi-Agent Reinforcement Learning for Behavior-Based Cooperative Navigation


これにより、複数のパス プランナーがロボットのチームを制御することに伴う複雑さが解消されます。
安全性を確保するために、MARL フレームワークはモデル予測制御 (MPC) を使用して、トレーニングおよび実行中に衝突につながる可能性のあるアクションを防止します。
最後に、学習プロセスに対する MPC 安全フィルターの影響を研究し、トレーニング中により高速な収束が達成されることを明らかにし、トレーニングの初期段階であっても、私たちのアプローチが実際のロボットに安全に導入できることを示します。


In this paper, we address the problem of behavior-based cooperative navigation of mobile robots using safe multi-agent reinforcement learning~(MARL). Our work is the first to focus on cooperative navigation without individual reference targets for the robots, using a single target for the formation’s centroid. This eliminates the complexities involved in having several path planners to control a team of robots. To ensure safety, our MARL framework uses model predictive control (MPC) to prevent actions that could lead to collisions during training and execution. We demonstrate the effectiveness of our method in simulation and on real robots, achieving safe behavior-based cooperative navigation without using individual reference targets, with zero collisions, and faster target reaching compared to baselines. Finally, we study the impact of MPC safety filters on the learning process, revealing that we achieve faster convergence during training and we show that our approach can be safely deployed on real robots, even during early stages of the training.


著者 Murad Dawood,Sicong Pan,Nils Dengler,Siqi Zhou,Angela P. Schoellig,Maren Bennewitz
発行日 2024-10-07 08:10:47+00:00
A Planar-Symmetric SO(3) Representation for Learning Grasp Detection


ただし、それらの対称性により、SO(3) 表現に曖昧さと不連続性が生じ、ニューラル ネットワーク ベースの把握検出器のトレーニングと推論の両方が妨げられます。
我々は、2D ビンガム分布を利用して、単一のパラメーター セットで一対の面対称ポーズをパラメーター化できる新しい SO(3) 表現を提案します。


Planar-symmetric hands, such as parallel grippers, are widely adopted in both research and industrial fields. Their symmetry, however, introduces ambiguity and discontinuity in the SO(3) representation, which hinders both the training and inference of neural-network-based grasp detectors. We propose a novel SO(3) representation that can parametrize a pair of planar-symmetric poses with a single parameter set by leveraging the 2D Bingham distribution. We also detail a grasp detector based on our representation, which provides a more consistent rotation output. An intensive evaluation with multiple grippers and objects in both the simulation and the real world quantitatively shows our approach’s contribution.


著者 Tianyi Ko,Takuya Ikeda,Hiroya Sato,Koichi Nishiwaki
発行日 2024-10-07 08:25:59+00:00
TD-NeRF: Novel Truncated Depth Prior for Joint Camera Pose and Neural Radiance Field Optimization


正確なカメラのポーズへの依存は、3D 再構築および SLAM タスク用の Neural Radiance Fields (NeRF) モデルの広範な導入にとって大きな障壁となっています。
既存の方法では、カメラのポーズと NeRF を共同で最適化するために単眼深度事前分布を導入していますが、深度事前分布を十分に活用できず、固有のノイズの影響が無視されています。
この論文では、放射輝度フィールドとカメラ ポーズの学習可能なパラメータを共同最適化することで、未知のカメラ ポーズから NeRF をトレーニングできる新しいアプローチである Truncated Depth NeRF (TD-NeRF) を提案します。
私たちのアプローチは、3 つの重要な進歩を通じて単眼深度事前分布を明示的に利用しています。1) 切り詰められた正規分布に基づく新しい深度ベースの光線サンプリング戦略を提案します。これにより、姿勢推定の収束速度と精度が向上します。
2) 極小値を回避し、深度ジオメトリを洗練するために、深度の精度を段階的に向上させる粗いトレーニングから細かいトレーニング戦略を導入します。
3) トレーニング中の深度ノイズに対するロバスト性を強化する、よりロバストなフレーム間ポイント制約を提案します。
3 つのデータセットに関する実験結果は、TD-NeRF がカメラ ポーズと NeRF の共同最適化において従来の研究を上回る優れたパフォーマンスを達成し、より正確な深度ジオメトリを生成することを示しています。
私たちのメソッドの実装は https://github.com/nubot-nudt/TD-NeRF でリリースされました。


The reliance on accurate camera poses is a significant barrier to the widespread deployment of Neural Radiance Fields (NeRF) models for 3D reconstruction and SLAM tasks. The existing method introduces monocular depth priors to jointly optimize the camera poses and NeRF, which fails to fully exploit the depth priors and neglects the impact of their inherent noise. In this paper, we propose Truncated Depth NeRF (TD-NeRF), a novel approach that enables training NeRF from unknown camera poses – by jointly optimizing learnable parameters of the radiance field and camera poses. Our approach explicitly utilizes monocular depth priors through three key advancements: 1) we propose a novel depth-based ray sampling strategy based on the truncated normal distribution, which improves the convergence speed and accuracy of pose estimation; 2) to circumvent local minima and refine depth geometry, we introduce a coarse-to-fine training strategy that progressively improves the depth precision; 3) we propose a more robust inter-frame point constraint that enhances robustness against depth noise during training. The experimental results on three datasets demonstrate that TD-NeRF achieves superior performance in the joint optimization of camera pose and NeRF, surpassing prior works, and generates more accurate depth geometry. The implementation of our method has been released at https://github.com/nubot-nudt/TD-NeRF.


著者 Zhen Tan,Zongtan Zhou,Yangbing Ge,Zi Wang,Xieyuanli Chen,Dewen Hu
発行日 2024-10-07 08:28:43+00:00
Learning an Actionable Discrete Diffusion Policy via Large-Scale Actionless Video Pre-Training


まず、人間とロボットの両方のビデオを統合ビデオ トークンに圧縮します。
事前トレーニング段階では、マスクと置換の拡散戦略を備えた離散拡散モデルを採用して、潜在空間内の将来のビデオ トークンを予測します。
微調整段階では、想像された未来のビデオを利用して、限られたロボット データのセットで低レベルのアクション学習をガイドします。
私たちのプロジェクトの Web サイトは https://video-diff.github.io/ から入手できます。


Learning a generalist embodied agent capable of completing multiple tasks poses challenges, primarily stemming from the scarcity of action-labeled robotic datasets. In contrast, a vast amount of human videos exist, capturing intricate tasks and interactions with the physical world. Promising prospects arise for utilizing actionless human videos for pre-training and transferring the knowledge to facilitate robot policy learning through limited robot demonstrations. However, it remains a challenge due to the domain gap between humans and robots. Moreover, it is difficult to extract useful information representing the dynamic world from human videos, because of its noisy and multimodal data structure. In this paper, we introduce a novel framework to tackle these challenges, which leverages a unified discrete diffusion to combine generative pre-training on human videos and policy fine-tuning on a small number of action-labeled robot videos. We start by compressing both human and robot videos into unified video tokens. In the pre-training stage, we employ a discrete diffusion model with a mask-and-replace diffusion strategy to predict future video tokens in the latent space. In the fine-tuning stage, we harness the imagined future videos to guide low-level action learning with a limited set of robot data. Experiments demonstrate that our method generates high-fidelity future videos for planning and enhances the fine-tuned policies compared to previous state-of-the-art approaches with superior performance. Our project website is available at https://video-diff.github.io/.


著者 Haoran He,Chenjia Bai,Ling Pan,Weinan Zhang,Bin Zhao,Xuelong Li
発行日 2024-10-07 08:45:35+00:00
Unsupervised Skill Discovery for Robotic Manipulation through Automatic Task Generation


さらに、学習したスキルを使用して、シミュレーションでも実際のロボット プラットフォームでも、目に見えない一連の操作タスクを解決できます。


Learning skills that interact with objects is of major importance for robotic manipulation. These skills can indeed serve as an efficient prior for solving various manipulation tasks. We propose a novel Skill Learning approach that discovers composable behaviors by solving a large and diverse number of autonomously generated tasks. Our method learns skills allowing the robot to consistently and robustly interact with objects in its environment. The discovered behaviors are embedded in primitives which can be composed with Hierarchical Reinforcement Learning to solve unseen manipulation tasks. In particular, we leverage Asymmetric Self-Play to discover behaviors and Multiplicative Compositional Policies to embed them. We compare our method to Skill Learning baselines and find that our skills are more interactive. Furthermore, the learned skills can be used to solve a set of unseen manipulation tasks, in simulation as well as on a real robotic platform.


著者 Paul Jansonnie,Bingbing Wu,Julien Perez,Jan Peters
発行日 2024-10-07 09:19:13+00:00
Auto-Multilift: Distributed Learning and Control for Cooperative Load Transportation With Quadrotors


マルチリフト システムのモーション コントロールと計画アルゴリズムの設計は、力学、衝突回避、アクチュエータの制限、拡張性の複雑さのため、依然として困難です。
この論文では、マルチリフト システムのモデル予測コントローラー (MPC) のチューニングを自動化する新しいフレームワークである Auto-Multilift を提案します。
MPC コスト関数をディープ ニューラル ネットワーク (DNN) でモデル化し、さまざまなシナリオへの迅速なオンライン適応を可能にします。
これらの DNN を閉ループ方式で効率的にトレーニングするための分散ポリシー勾配アルゴリズムを開発します。
私たちのアルゴリズムの中心となるのは分散感度伝播であり、これはマルチリフト システム内の独自の動的結合を最大限に活用することに基づいて構築されています。
これは、クアッドローター全体にわたる勾配計算を並列化し、主要な MPC パラメーターに対する実際のシステム状態の感度に焦点を当てます。
私たちの方法は、軌道追跡エラーから適応 MPC を効果的に学習することにより、最先端の開ループ MPC 調整アプローチを上回ります。


Designing motion control and planning algorithms for multilift systems remains challenging due to the complexities of dynamics, collision avoidance, actuator limits, and scalability. Existing methods that use optimization and distributed techniques effectively address these constraints and scalability issues. However, they often require substantial manual tuning, leading to suboptimal performance. This paper proposes Auto-Multilift, a novel framework that automates the tuning of model predictive controllers (MPCs) for multilift systems. We model the MPC cost functions with deep neural networks (DNNs), enabling fast online adaptation to various scenarios. We develop a distributed policy gradient algorithm to train these DNNs efficiently in a closed-loop manner. Central to our algorithm is distributed sensitivity propagation, which is built on fully exploiting the unique dynamic couplings within the multilift system. It parallelizes gradient computation across quadrotors and focuses on actual system state sensitivities relative to key MPC parameters. Extensive simulations demonstrate favorable scalability to a large number of quadrotors. Our method outperforms a state-of-the-art open-loop MPC tuning approach by effectively learning adaptive MPCs from trajectory tracking errors. It also excels in learning an adaptive reference for reconfiguring the system when traversing multiple narrow slots.


著者 Bingheng Wang,Rui Huang,Lin Zhao
発行日 2024-10-07 09:22:59+00:00
