AutoTrust: Benchmarking Trustworthiness in Large Vision Language Models for Autonomous Driving

要約

自動運転 (AD) 向けに調整されたラージビジョン言語モデル (VLM) の最近の進歩により、シーンの理解と推論の強力な機能が示され、エンドツーエンドの運転システムの紛れもない候補となっています。
ただし、公共交通機関の安全に直接影響を与える重要な要素である DriveVLM の信頼性を研究する研究は限られています。
このペーパーでは、信頼性、安全性、堅牢性、プライバシー、公平性などのさまざまな観点を考慮した、自動運転における大規模ビジョン言語モデル (DriveVLM) の包括的な信頼性ベンチマークである AutoTrust を紹介します。
私たちは、運転シナリオにおける信頼性の問題を調査するために、10,000 を超える固有のシーンと 18,000 のクエリで構成される最大の視覚的な質問応答データセットを構築しました。
私たちは、ジェネラリストからスペシャリストまで、オープンソースから商用モデルまで、6 つの公的に入手可能な VLM を評価しました。
私たちの徹底的な評価により、信頼性の脅威に対するこれまで発見されていなかった DriveVLM の脆弱性が明らかになりました。
具体的には、LLaVA-v1.6 や GPT-4o-mini などの一般的な VLM が、全体的な信頼性の点で、運転用に微調整された特殊なモデルよりも驚くほど優れていることがわかりました。
DriveLM-Agent のような DriveVLM は、機密情報の漏洩に対して特に脆弱です。
さらに、ジェネラリストとスペシャリストのどちらの VLM も依然として敵対的な攻撃の影響を受けやすく、多様な環境や集団にわたって公平な意思決定を確保するのに苦労しています。
私たちの調査結果は、DriveVLM の信頼性、つまり公共の安全と自律交通システムに依存するすべての国民の福祉にとって非常に重要な問題に対処するために、即時かつ断固たる行動を起こすことを求めています。
私たちのベンチマークは \url{https://github.com/taco-group/AutoTrust} で公開されており、リーダーボードは \url{https://taco-group.github.io/AutoTrust/} で公開されています。

要約(オリジナル)

Recent advancements in large vision language models (VLMs) tailored for autonomous driving (AD) have shown strong scene understanding and reasoning capabilities, making them undeniable candidates for end-to-end driving systems. However, limited work exists on studying the trustworthiness of DriveVLMs — a critical factor that directly impacts public transportation safety. In this paper, we introduce AutoTrust, a comprehensive trustworthiness benchmark for large vision-language models in autonomous driving (DriveVLMs), considering diverse perspectives — including trustfulness, safety, robustness, privacy, and fairness. We constructed the largest visual question-answering dataset for investigating trustworthiness issues in driving scenarios, comprising over 10k unique scenes and 18k queries. We evaluated six publicly available VLMs, spanning from generalist to specialist, from open-source to commercial models. Our exhaustive evaluations have unveiled previously undiscovered vulnerabilities of DriveVLMs to trustworthiness threats. Specifically, we found that the general VLMs like LLaVA-v1.6 and GPT-4o-mini surprisingly outperform specialized models fine-tuned for driving in terms of overall trustworthiness. DriveVLMs like DriveLM-Agent are particularly vulnerable to disclosing sensitive information. Additionally, both generalist and specialist VLMs remain susceptible to adversarial attacks and struggle to ensure unbiased decision-making across diverse environments and populations. Our findings call for immediate and decisive action to address the trustworthiness of DriveVLMs — an issue of critical importance to public safety and the welfare of all citizens relying on autonomous transportation systems. Our benchmark is publicly available at \url{https://github.com/taco-group/AutoTrust}, and the leaderboard is released at \url{https://taco-group.github.io/AutoTrust/}.

arxiv情報

著者	Shuo Xing,Hongyuan Hua,Xiangbo Gao,Shenzhe Zhu,Renjie Li,Kexin Tian,Xiaopeng Li,Heng Huang,Tianbao Yang,Zhangyang Wang,Yang Zhou,Huaxiu Yao,Zhengzhong Tu
発行日	2024-12-19 18:59:33+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

AutoTrust: Benchmarking Trustworthiness in Large Vision Language Models for Autonomous Driving

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー