PoseViNet: Distracted Driver Action Recognition Framework Using Multi-View Pose Estimation and Vision Transformer

要約

ドライバーの注意力散漫は交通事故の主な原因です。
米国道路交通安全局が実施した調査によると、車両の運転中に車内メニューの操作、食べ物や飲み物の摂取、電話での会話などの活動を行うことは、ドライバーの注意散漫の重大な原因となる可能性があります。
この観点から、本稿では、多視点ドライバーアクション画像を使用してドライバーの注意散漫を検出するための新しい方法を紹介します。
提案された方法は、ポーズ推定とアクション推論を備えたビジョントランスフォーマーベースのフレームワーク、つまり PoseViNet です。
姿勢情報を追加する目的は、トランスフォーマーが主要な機能にさらに集中できるようにすることです。
その結果、フレームワークは重要なアクションをより適切に特定できるようになります。
提案されたフレームワークは、ドライバーの 10 の行動を表す SFD3 データセットを使用して、さまざまな最先端のモデルと比較されます。
比較から、PoseViNet がこれらのモデルよりも優れていることがわかります。
提案されたフレームワークは、ドライバーの 16 の動作を表す SynDD1 データセットでも評価されます。
その結果、PoseViNet は、困難なデータセットでも 97.55% の検証精度と 90.92% のテスト精度を達成しました。

要約(オリジナル)

Driver distraction is a principal cause of traffic accidents. In a study conducted by the National Highway Traffic Safety Administration, engaging in activities such as interacting with in-car menus, consuming food or beverages, or engaging in telephonic conversations while operating a vehicle can be significant sources of driver distraction. From this viewpoint, this paper introduces a novel method for detection of driver distraction using multi-view driver action images. The proposed method is a vision transformer-based framework with pose estimation and action inference, namely PoseViNet. The motivation for adding posture information is to enable the transformer to focus more on key features. As a result, the framework is more adept at identifying critical actions. The proposed framework is compared with various state-of-the-art models using SFD3 dataset representing 10 behaviors of drivers. It is found from the comparison that the PoseViNet outperforms these models. The proposed framework is also evaluated with the SynDD1 dataset representing 16 behaviors of driver. As a result, the PoseViNet achieves 97.55% validation accuracy and 90.92% testing accuracy with the challenging dataset.

arxiv情報

著者	Neha Sengar,Indra Kumari,Jihui Lee,Dongsoo Har
発行日	2023-12-22 10:13:10+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

PoseViNet: Distracted Driver Action Recognition Framework Using Multi-View Pose Estimation and Vision Transformer

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー