DriveSOTIF: Advancing Perception SOTIF Through Multimodal Large Language Models

要約

人間のドライバーは、自然に運転シナリオを認識し、潜在的な危険を予測し、空間的および因果知能のために本能的に反応する能力を持っています。
ただし、自動運転車はこれらの能力を欠いており、特に複雑で予測不可能な運転条件で、意図した機能（SOTIF）リスクの知覚関連の安全性を効果的に管理する課題につながります。
このギャップに対処するために、知覚関連のSOTIFシナリオをキャプチャするように特別に設計されたカスタマイズされたデータセットで、マルチモーダル言語モデル（MLLMS）を微調整するアプローチを提案します。
モデルベンチマークは、このカスタマイズされたデータセットにより、モデルがこれらの複雑な運転状況をよりよく理解し、応答できることを示しています。
さらに、現実世界のケーススタディでは、提案された方法は、人間のドライバーでさえ困難であると感じる挑戦的なシナリオを正しく処理します。
リアルタイムのパフォーマンステストは、モデルがライブドライビング環境で効率的に動作する可能性をさらに示しています。
このアプローチは、データセット生成パイプラインとともに、自律駆動システムにおけるSOTIF関連のリスクに対する識別、認知、予測、および反応を改善するための大きな約束を示しています。
データセットと情報は、https：//github.com/s95huang/drivesotif.gitを利用できます

要約(オリジナル)

Human drivers naturally possess the ability to perceive driving scenarios, predict potential hazards, and react instinctively due to their spatial and causal intelligence, which allows them to perceive, understand, predict, and interact with the 3D world both spatially and temporally. Autonomous vehicles, however, lack these capabilities, leading to challenges in effectively managing perception-related Safety of the Intended Functionality (SOTIF) risks, particularly in complex and unpredictable driving conditions. To address this gap, we propose an approach that fine-tunes multimodal language models (MLLMs) on a customized dataset specifically designed to capture perception-related SOTIF scenarios. Model benchmarking demonstrates that this tailored dataset enables the models to better understand and respond to these complex driving situations. Additionally, in real-world case studies, the proposed method correctly handles challenging scenarios that even human drivers may find difficult. Real-time performance tests further indicate the potential for the models to operate efficiently in live driving environments. This approach, along with the dataset generation pipeline, shows significant promise for improving the identification, cognition, prediction, and reaction to SOTIF-related risks in autonomous driving systems. The dataset and information are available: https://github.com/s95huang/DriveSOTIF.git

arxiv情報

著者	Shucheng Huang,Freda Shi,Chen Sun,Jiaming Zhong,Minghao Ning,Yufeng Yang,Yukun Lu,Hong Wang,Amir Khajepour
発行日	2025-05-11 18:14:33+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

DriveSOTIF: Advancing Perception SOTIF Through Multimodal Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー