Hard Cases Detection in Motion Prediction by Vision-Language Foundation Models

要約

異常な道路利用者、異常気象、複雑な交通相互作用など、自動運転における困難なケースに対処することは、大きな課題となります。
安全性を確保するには、自動運転システムのこれらのシナリオを効果的に検出して管理することが重要です。
ただし、これらのケースは稀でありリスクが高いため、堅牢なモデルをトレーニングするには広範で多様なデータセットが必要です。
視覚言語基盤モデル (VLM) は、広範なデータセットでトレーニングされると、驚くべきゼロショット機能を示しました。
この研究では、自動運転における困難なケースを検出する際の VLM の可能性を探ります。
エージェントレベルとシナリオレベルの両方で、交通参加者の動き予測における困難なケースを検出する GPT-4v などの VLM の機能を実証します。
設計されたプロンプトを備えた連続画像フレームを VLM に供給することで、困難なエージェントやシナリオを効果的に特定し、既存の予測モデルによって検証される、実現可能なパイプラインを導入します。
さらに、この VLM によるハードケースの検出を利用して、GPT が提案するトレーニングサンプルのデータ選択を実行することで、既存の動き予測パイプラインのトレーニング効率をさらに向上させます。
NuScenes データセット上の最先端のメソッドを使用して VLM を組み込んだパイプラインの有効性と実現可能性を示します。
コードは https://github.com/KTH-RPL/Detect_VLM からアクセスできます。

要約(オリジナル)

Addressing hard cases in autonomous driving, such as anomalous road users, extreme weather conditions, and complex traffic interactions, presents significant challenges. To ensure safety, it is crucial to detect and manage these scenarios effectively for autonomous driving systems. However, the rarity and high-risk nature of these cases demand extensive, diverse datasets for training robust models. Vision-Language Foundation Models (VLMs) have shown remarkable zero-shot capabilities as being trained on extensive datasets. This work explores the potential of VLMs in detecting hard cases in autonomous driving. We demonstrate the capability of VLMs such as GPT-4v in detecting hard cases in traffic participant motion prediction on both agent and scenario levels. We introduce a feasible pipeline where VLMs, fed with sequential image frames with designed prompts, effectively identify challenging agents or scenarios, which are verified by existing prediction models. Moreover, by taking advantage of this detection of hard cases by VLMs, we further improve the training efficiency of the existing motion prediction pipeline by performing data selection for the training samples suggested by GPT. We show the effectiveness and feasibility of our pipeline incorporating VLMs with state-of-the-art methods on NuScenes datasets. The code is accessible at https://github.com/KTH-RPL/Detect_VLM.

arxiv情報

著者	Yi Yang,Qingwen Zhang,Kei Ikemura,Nazre Batool,John Folkesson
発行日	2024-05-31 16:35:41+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Hard Cases Detection in Motion Prediction by Vision-Language Foundation Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー