View-Invariant Policy Learning via Zero-Shot Novel View Synthesis

要約

大規模な視覚運動政策学習は、一般化可能な操作システムの開発に向けた有望なアプローチです。
しかし、多様な実施形態、環境、および観察モダリティに基づいて展開できるポリシーは、とらえどころのないままです。
この作業では、世界の大規模な視覚データからの知識を使用して、一般化可能な操作のための1つの軸に対処するためにどのように使用できるかを調査します。
具体的には、単一のカメラビューポイントから同じシーンの画像を単一の入力画像を与えられた場合にレンダリングすることにより、3Dにアウェアのシーンレベルの事前に学習する単一画像の新規ビュー合成モデルを研究します。
多様なロボットデータへの実用的なアプリケーションのために、これらのモデルはゼロショットを動作させ、目に見えないタスクと環境でビュー合成を実行する必要があります。
ビュー合成拡張（Vista）を呼び出す単純なデータの高度制度内のビュー合成モデルを経験的に分析して、シングルビューポイントデモンストレーションデータから視点に不変のポリシーを学習する能力を理解します。
分散型カメラの視点に向けて方法で訓練されたポリシーの堅牢性を評価すると、シミュレートされた操作タスクと実際の操作タスクの両方でベースラインを上回ることがわかります。
ビデオと追加の視覚化は、https：//s-tian.github.io/projects/vistaで入手できます。

要約(オリジナル)

Large-scale visuomotor policy learning is a promising approach toward developing generalizable manipulation systems. Yet, policies that can be deployed on diverse embodiments, environments, and observational modalities remain elusive. In this work, we investigate how knowledge from large-scale visual data of the world may be used to address one axis of variation for generalizable manipulation: observational viewpoint. Specifically, we study single-image novel view synthesis models, which learn 3D-aware scene-level priors by rendering images of the same scene from alternate camera viewpoints given a single input image. For practical application to diverse robotic data, these models must operate zero-shot, performing view synthesis on unseen tasks and environments. We empirically analyze view synthesis models within a simple data-augmentation scheme that we call View Synthesis Augmentation (VISTA) to understand their capabilities for learning viewpoint-invariant policies from single-viewpoint demonstration data. Upon evaluating the robustness of policies trained with our method to out-of-distribution camera viewpoints, we find that they outperform baselines in both simulated and real-world manipulation tasks. Videos and additional visualizations are available at https://s-tian.github.io/projects/vista.

arxiv情報

著者	Stephen Tian,Blake Wulfe,Kyle Sargent,Katherine Liu,Sergey Zakharov,Vitor Guizilini,Jiajun Wu
発行日	2025-02-19 21:10:44+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

View-Invariant Policy Learning via Zero-Shot Novel View Synthesis

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー