SMART: Scalable Multi-agent Real-time Motion Generation via Next-token Prediction

要約

データ駆動型の自律走行モーション生成タスクは、データセットサイズやデータセット間の領域ギャップの制限に頻繁に影響を受けるため、実世界のシナリオで広範に適用することができない。この問題に対処するために、ベクトル化された地図とエージェントの軌跡データを離散シーケンストークンにモデル化する、新しい自律走行モーション生成パラダイムであるSMARTを紹介する。これらのトークンは、デコーダのみの変換器アーキテクチャを通して処理され、空間-時間系列にわたる次のトークン予測タスクのために訓練される。このGPTスタイルの手法により、モデルは実際の運転シナリオにおける運動分布を学習することができる。SMARTは、ジェネレーティブ・シム・エージェント・チャレンジのほとんどのメトリクスで最先端の性能を達成し、Waymo Open Motion Dataset (WOMD)のリーダーボードで1位を獲得し、顕著な推論速度を実証しました。さらに、SMARTは自律走行モーション領域における生成モデルを代表し、ゼロショット汎化能力を示しています：学習にNuPlanデータセット、検証にWOMDのみを使用し、SMARTはSim Agentsチャレンジで0.72という競争力のあるスコアを達成しました。最後に、複数のデータセットから10億以上のモーショントークンを収集し、モデルのスケーラビリティを検証しました。これらの結果は、SMARTがスケーラビリティとゼロショット汎化という2つの重要な特性をエミュレートしており、大規模リアルタイムシミュレーションアプリケーションのニーズを予備的に満たしていることを示唆しています。自律走行分野における運動生成モデルの探求を促進するため、全コードを公開しました。ソースコードはhttps://github.com/rainmaker22/SMART。

要約(オリジナル)

Data-driven autonomous driving motion generation tasks are frequently impacted by the limitations of dataset size and the domain gap between datasets, which precludes their extensive application in real-world scenarios. To address this issue, we introduce SMART, a novel autonomous driving motion generation paradigm that models vectorized map and agent trajectory data into discrete sequence tokens. These tokens are then processed through a decoder-only transformer architecture to train for the next token prediction task across spatial-temporal series. This GPT-style method allows the model to learn the motion distribution in real driving scenarios. SMART achieves state-of-the-art performance across most of the metrics on the generative Sim Agents challenge, ranking 1st on the leaderboards of Waymo Open Motion Dataset (WOMD), demonstrating remarkable inference speed. Moreover, SMART represents the generative model in the autonomous driving motion domain, exhibiting zero-shot generalization capabilities: Using only the NuPlan dataset for training and WOMD for validation, SMART achieved a competitive score of 0.72 on the Sim Agents challenge. Lastly, we have collected over 1 billion motion tokens from multiple datasets, validating the model’s scalability. These results suggest that SMART has initially emulated two important properties: scalability and zero-shot generalization, and preliminarily meets the needs of large-scale real-time simulation applications. We have released all the code to promote the exploration of models for motion generation in the autonomous driving field. The source code is available at https://github.com/rainmaker22/SMART.

arxiv情報

著者	Wei Wu,Xiaoxin Feng,Ziyan Gao,Yuheng Kan
発行日	2024-11-01 06:19:24+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

SMART: Scalable Multi-agent Real-time Motion Generation via Next-token Prediction

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー