A Framework for a Capability-driven Evaluation of Scenario Understanding for Multimodal Large Language Models in Autonomous Driving

要約

マルチモーダル大手言語モデル（MLLM）は、ドメインに依存しない世界知識とコンテキスト固有の言語ガイダンスを組み合わせることにより、自律運転を強化する可能性を秘めています。
自律運転システムへの統合は、孤立した概念実証アプリケーションで有望な結果を示し、そのパフォーマンスは、知覚、推論、または計画の選択的な特異な側面で評価されます。
自律運転のコンテキストでMLLMを評価するための体系的なフレームワークを最大限に活用することが必要です。
このペーパーでは、自律運転におけるMLLMの能力駆動型評価のための全体的な枠組みを提案します。
フレームワークは、4つのコア機能ディメンションセマンティック、空間、時間、および物理に沿ったシナリオの理解を構成します。
それらは、自律駆動システム、人間のドライバー認知、言語ベースの推論の一般的な要件から派生しています。
さらに、ドメインをコンテキストレイヤー、モダリティの処理、および言語ベースの相互作用や意思決定などの下流タスクに整理します。
フレームワークの適用性を説明するために、2つの模範的なトラフィックシナリオが分析され、現実的な運転状況で提案された次元を接地します。
このフレームワークは、自律運転におけるシナリオ理解のためのMLLMの可能性の構造化された評価の基盤を提供します。

要約(オリジナル)

Multimodal large language models (MLLMs) hold the potential to enhance autonomous driving by combining domain-independent world knowledge with context-specific language guidance. Their integration into autonomous driving systems shows promising results in isolated proof-of-concept applications, while their performance is evaluated on selective singular aspects of perception, reasoning, or planning. To leverage their full potential a systematic framework for evaluating MLLMs in the context of autonomous driving is required. This paper proposes a holistic framework for a capability-driven evaluation of MLLMs in autonomous driving. The framework structures scenario understanding along the four core capability dimensions semantic, spatial, temporal, and physical. They are derived from the general requirements of autonomous driving systems, human driver cognition, and language-based reasoning. It further organises the domain into context layers, processing modalities, and downstream tasks such as language-based interaction and decision-making. To illustrate the framework’s applicability, two exemplary traffic scenarios are analysed, grounding the proposed dimensions in realistic driving situations. The framework provides a foundation for the structured evaluation of MLLMs’ potential for scenario understanding in autonomous driving.

arxiv情報

著者	Tin Stribor Sohn,Philipp Reis,Maximilian Dillitzer,Johannes Bach,Jason J. Corso,Eric Sax
発行日	2025-03-14 13:43:26+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

A Framework for a Capability-driven Evaluation of Scenario Understanding for Multimodal Large Language Models in Autonomous Driving

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー