Temporal Triplane Transformers as Occupancy World Models

要約

世界モデルは、将来のシーンの予測を可能にする環境の表現を学習または構築し、それによってインテリジェントなモーション計画をサポートすることを目的としています。
ただし、既存のモデルはしばしば、きめの細かい予測を作成し、リアルタイムで動作するのに苦労しています。
この作業では、自律運転のための新しい4D占有世界モデルであるT $^3 $ ersを提案します。
T $^3 $前者は、3D占有を効率的にエンコードするコンパクトな{\ em Triplane}表現を事前にトレーニングすることから始まります。
次に、歴史的なトリプランからマルチスケールの時間運動機能を抽出し、将来のトリプレーンの変化を繰り返し予測するための自己回帰アプローチを採用します。
最後に、これらのトリプルの変更は以前の状態と組み合わされて、将来の占有率と自我モーションの軌跡を解読します。
実験結果は、t $^3 $前の前者が1.44 $ \ Times $ speedup（26 fps）を達成し、平均IOUを36.09に改善し、平均絶対計画誤差を1.0メートルに減らすことを示しています。
デモは補足資料で利用できます。

要約(オリジナル)

World models aim to learn or construct representations of the environment that enable the prediction of future scenes, thereby supporting intelligent motion planning. However, existing models often struggle to produce fine-grained predictions and to operate in real time. In this work, we propose T$^3$Former, a novel 4D occupancy world model for autonomous driving. T$^3$Former begins by pre-training a compact {\em triplane} representation that efficiently encodes 3D occupancy. It then extracts multi-scale temporal motion features from historical triplanes and employs an autoregressive approach to iteratively predict future triplane changes. Finally, these triplane changes are combined with previous states to decode future occupancy and ego-motion trajectories. Experimental results show that T$^3$Former achieves 1.44$\times$ speedup (26 FPS), improves mean IoU to 36.09, and reduces mean absolute planning error to 1.0 meters. Demos are available in the supplementary material.

arxiv情報

著者	Haoran Xu,Peixi Peng,Guang Tan,Yiqian Chang,Yisen Zhao,Yonghong Tian
発行日	2025-05-15 08:04:19+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Temporal Triplane Transformers as Occupancy World Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー