PRISM: Projection-based Reward Integration for Scene-Aware Real-to-Sim-to-Real Transfer with Few Demonstrations

要約

ロボットの初期位置とオブジェクトポーズのバリエーションに堅牢にポリシーを開発するためのデモンストレーションから学習することは、ロボット工学に非常に重要な関心の問題です。
限られたサンプルから一般化するのに苦労している模倣学習と比較して、強化学習（RL）は、堅牢な動作を得るために自律的に探求できます。
現実世界との直接的な相互作用を通じてRLエージェントをトレーニングすることはしばしば非現実的で安全ではありませんが、シミュレーション環境を構築するには、シーンの設計やタスク固有の報酬機能の作成など、広範な手動努力が必要です。
これらの課題に対処するために、画像からシーンオブジェクトを識別し、既存のライブラリから対応する3Dモデルを取得することにより、エキスパートデモンストレーションに基づいてシミュレーション環境を構築する統合されたリアルからシムからリアルまでのパイプラインを提案します。
RLポリシートレーニングの投影ベースの報酬モデルを導入します。これは、人間ガイド付きオブジェクト投影関係をプロンプトとして使用して、ビジョン言語モデル（VLM）によって監督され、ポリシーがエキスパートデモンストレーションを使用してさらに微調整されています。
一般に、私たちの作業は、シミュレーション環境とRLベースのポリシートレーニングの構築に焦点を当てており、最終的には現実世界のシナリオで信頼できるロボット制御ポリシーの展開を可能にします。

要約(オリジナル)

Learning from few demonstrations to develop policies robust to variations in robot initial positions and object poses is a problem of significant practical interest in robotics. Compared to imitation learning, which often struggles to generalize from limited samples, reinforcement learning (RL) can autonomously explore to obtain robust behaviors. Training RL agents through direct interaction with the real world is often impractical and unsafe, while building simulation environments requires extensive manual effort, such as designing scenes and crafting task-specific reward functions. To address these challenges, we propose an integrated real-to-sim-to-real pipeline that constructs simulation environments based on expert demonstrations by identifying scene objects from images and retrieving their corresponding 3D models from existing libraries. We introduce a projection-based reward model for RL policy training that is supervised by a vision-language model (VLM) using human-guided object projection relationships as prompts, with the policy further fine-tuned using expert demonstrations. In general, our work focuses on the construction of simulation environments and RL-based policy training, ultimately enabling the deployment of reliable robotic control policies in real-world scenarios.

arxiv情報

著者	Haowen Sun,Han Wang,Chengzhong Ma,Shaolong Zhang,Jiawei Ye,Xingyu Chen,Xuguang Lan
発行日	2025-04-29 08:01:27+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

PRISM: Projection-based Reward Integration for Scene-Aware Real-to-Sim-to-Real Transfer with Few Demonstrations

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー