Illicit object detection in X-ray images using Vision Transformers

要約

不法物体の検出は、空港、鉄道駅、地下鉄、港など、セキュリティが厳重なさまざまな場所で実行される重要なタスクです。
1 時間あたり何千枚もの X 線画像を検査するという継続的で退屈な作業は、精神的に負担となる場合があります。
したがって、ディープニューラルネットワーク (DNN) を使用すると、X 線画像分析プロセスを自動化し、効率を向上させ、警備員の検査負担を軽減できます。
関連文献で通常使用されるニューラルアーキテクチャは畳み込みニューラルネットワーク (CNN) であり、ビジョントランスフォーマー (ViT) が使用されることはほとんどありません。
このギャップに対処するために、この論文では、X 線画像内の違法品目の検出に関連する ViT アーキテクチャの包括的な評価を実施します。
この研究では、Transformer と SWIN や NextViT などのハイブリッドバックボーン、および DINO や RT-DETR などの検出器の両方を利用します。
この結果は、低データ領域における DINO Transformer 検出器の驚くべき精度、YOLOv8 の優れたリアルタイムパフォーマンス、およびハイブリッド NextViT バックボーンの有効性を示しています。

要約(オリジナル)

Illicit object detection is a critical task performed at various high-security locations, including airports, train stations, subways, and ports. The continuous and tedious work of examining thousands of X-ray images per hour can be mentally taxing. Thus, Deep Neural Networks (DNNs) can be used to automate the X-ray image analysis process, improve efficiency and alleviate the security officers’ inspection burden. The neural architectures typically utilized in relevant literature are Convolutional Neural Networks (CNNs), with Vision Transformers (ViTs) rarely employed. In order to address this gap, this paper conducts a comprehensive evaluation of relevant ViT architectures on illicit item detection in X-ray images. This study utilizes both Transformer and hybrid backbones, such as SWIN and NextViT, and detectors, such as DINO and RT-DETR. The results demonstrate the remarkable accuracy of the DINO Transformer detector in the low-data regime, the impressive real-time performance of YOLOv8, and the effectiveness of the hybrid NextViT backbone.

arxiv情報

著者	Jorgen Cani,Ioannis Mademlis,Adamantia Anna Rebolledo Chrysochoou,Georgios Th. Papadopoulos
発行日	2024-04-29 13:08:36+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Illicit object detection in X-ray images using Vision Transformers

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー