AccidentBlip2: Accident Detection With Multi-View MotionBlip2

要約

インテリジェント車両は多くの交通シナリオで優れた機能を実証してきましたが、複雑な車載センサーと車載ニューラルネットワークの推論機能により、複雑な交通システムにおける事故検出におけるインテリジェント車両の精度は制限されます。
この論文では、純粋な視覚ベースのマルチモーダル大規模モデル Blip2 事故検出手法である AccidentBlip2 を紹介します。
私たちの手法では、まず ViT-14g を通じてマルチビューを処理し、マルチビューの特徴を Qformer のクロスアテンションレイヤーに入力します。一方、独自に設計した Motion Qformer は、Blip2 の Qformer のセルフアテンションレイヤーを、Blip2 の Qformer のテンポラルアテンションレイヤーに置き換えます。
推論プロセスでは、前のフレームで生成されたクエリが時間アテンション層に入力され、時間情報の推論が実現されます。
次に、MLP に入力されたクエリに対して自己回帰推論を実行することで、周囲の環境で事故が発生したかどうかを検出します。
また、各車両に Motion Qformer を展開し、同時に推論で生成されたクエリを自己回帰推論のために MLP に入力することで、複数車両の協調システムへのアプローチを拡張します。
私たちのアプローチは、既存のビデオ大規模言語モデルの精度を検出し、複数車両システムにも適応して、インテリジェントな交通シナリオへの適用性を高めます。

要約(オリジナル)

Intelligent vehicles have demonstrated excellent capabilities in many transportation scenarios, but the complex on-board sensors and the inference capabilities of on-board neural networks limit the accuracy of intelligent vehicles for accident detection in complex transportation systems. In this paper, we present AccidentBlip2, a pure vision-based multimodal large model Blip2 accident detection method. Our method first processes the multi-view through ViT-14g and inputs the multi-view features into the cross attention layer of the Qformer, while our self-designed Motion Qformer replaces the self-attention layer in Blip2’s Qformer with the Temporal Attention layer in the In the inference process, the query generated in the previous frame is input into the Temporal Attention layer to realize the inference for temporal information. Then we detect whether there is an accident in the surrounding environment by performing autoregressive inference on the query input to the MLP. We also extend our approach to a multi-vehicle cooperative system by deploying Motion Qformer on each vehicle and simultaneously inputting the inference-generated query into the MLP for autoregressive inference. Our approach detects the accuracy of existing video large language models and also adapts to multi-vehicle systems, making it more applicable to intelligent transportation scenarios.

arxiv情報

著者	Yihua Shao,Hongyi Cai,Xinwei Long,Weiyi Lang,Zhe Wang,Haoran Wu,Yan Wang,Jiayi Yin,Yang Yang,Zhen Lei
発行日	2024-04-22 17:07:07+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

AccidentBlip2: Accident Detection With Multi-View MotionBlip2

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー