Urban Safety Perception Assessments via Integrating Multimodal Large Language Models with Street View Images

要約

都市の安全性の認識を測定することは、伝統的に人的資源に大きく依存している重要で複雑なタスクです。
このプロセスには、多くの場合、広範なフィールド調査、手動データ収集、および主観的評価が含まれます。これには、時間がかかり、費用がかかり、時には一貫性がありません。
ストリートビュー画像（SVI）は、深い学習方法とともに、大規模な都市の安全検出を実現する方法を提供します。
ただし、この目標を達成するには、安全ランキングモデルを訓練するために広範な人間の注釈が必要であることが多く、都市間の建築の違いはこれらのモデルの移動性を妨げます。
したがって、安全評価を実施するための完全に自動化された方法が不可欠です。
マルチモーダル大手言語モデル（MLLM）の最近の進歩は、強力な推論と分析能力を実証しています。
最先端のモデル、たとえば、GPT-4は、多くのタスクで驚くべきパフォーマンスを示しています。
これらのモデルは、人間が解決したアンカーセットで都市の安全ランキングに採用し、MLLMの結果が人間の認識と密接に整合することを検証しました。
さらに、事前に訓練されたコントラスト型の言語イメージ前訓練前（CLIP）機能とK-Nearest Neighbors（K-NN）検索に基づいて、都市全体の安全性指数を迅速に評価する方法を提案しました。
実験結果は、私たちの方法が既存のトレーニングが必要な深い学習アプローチを上回り、効率的かつ正確な都市の安全評価を達成することを示しています。
都市の安全性認識評価のための提案された自動化は、都市環境の改善を目的とした都市計画者、政策立案者、および研究者にとって貴重なツールです。

要約(オリジナル)

Measuring urban safety perception is an important and complex task that traditionally relies heavily on human resources. This process often involves extensive field surveys, manual data collection, and subjective assessments, which can be time-consuming, costly, and sometimes inconsistent. Street View Images (SVIs), along with deep learning methods, provide a way to realize large-scale urban safety detection. However, achieving this goal often requires extensive human annotation to train safety ranking models, and the architectural differences between cities hinder the transferability of these models. Thus, a fully automated method for conducting safety evaluations is essential. Recent advances in multimodal large language models (MLLMs) have demonstrated powerful reasoning and analytical capabilities. Cutting-edge models, e.g., GPT-4 have shown surprising performance in many tasks. We employed these models for urban safety ranking on a human-annotated anchor set and validated that the results from MLLMs align closely with human perceptions. Additionally, we proposed a method based on the pre-trained Contrastive Language-Image Pre-training (CLIP) feature and K-Nearest Neighbors (K-NN) retrieval to quickly assess the safety index of the entire city. Experimental results show that our method outperforms existing training needed deep learning approaches, achieving efficient and accurate urban safety evaluations. The proposed automation for urban safety perception assessment is a valuable tool for city planners, policymakers, and researchers aiming to improve urban environments.

arxiv情報

著者	Jiaxin Zhang,Yunqin Li,Tomohiro Fukuda,Bowen Wang
発行日	2025-06-02 05:10:45+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Urban Safety Perception Assessments via Integrating Multimodal Large Language Models with Street View Images

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー