Efficient Vision Transformer for Accurate Traffic Sign Detection

要約

この研究論文は、自動運転車および運転支援システムにおける交通標識検出に関連する課題に取り組んでいます。
信頼性が高く精度の高いアルゴリズムの開発は、現実のさまざまなシナリオで交通標識認識および検出 (TSRD) を広く採用するために不可欠です。
ただし、このタスクは、カメラの動き、悪天候、不十分な照明などの要因の影響を受ける最適ではない交通画像によって複雑になります。
この研究では、特に交通標識の検出方法に焦点を当て、このタスクに取り組むための Transformer モデル、特に Vision Transformer のバリアントのアプリケーションを紹介します。
もともと自然言語処理用に設計された Transformer のアテンションメカニズムにより、並列効率が向上します。
ビジョントランスフォーマーは、自動運転、物体検出、ヘルスケア、防衛関連アプリケーションなど、さまざまな分野で成功を収めています。
変圧器モデルの効率を高めるために、研究では局所性誘導バイアスと変圧器モジュールを統合する新しい戦略を提案しています。
これには、短期および長期の依存関係情報を効果的に取得する Efficient Convolution ブロックと Local Transformer ブロックの導入が含まれており、それによって検出速度と精度の両方が向上します。
実験による評価では、特に GTSDB データセットに適用した場合に、このアプローチによって達成される大幅な進歩が実証されています。

要約(オリジナル)

This research paper addresses the challenges associated with traffic sign detection in self-driving vehicles and driver assistance systems. The development of reliable and highly accurate algorithms is crucial for the widespread adoption of traffic sign recognition and detection (TSRD) in diverse real-life scenarios. However, this task is complicated by suboptimal traffic images affected by factors such as camera movement, adverse weather conditions, and inadequate lighting. This study specifically focuses on traffic sign detection methods and introduces the application of the Transformer model, particularly the Vision Transformer variants, to tackle this task. The Transformer’s attention mechanism, originally designed for natural language processing, offers improved parallel efficiency. Vision Transformers have demonstrated success in various domains, including autonomous driving, object detection, healthcare, and defense-related applications. To enhance the efficiency of the Transformer model, the research proposes a novel strategy that integrates a locality inductive bias and a transformer module. This includes the introduction of the Efficient Convolution Block and the Local Transformer Block, which effectively capture short-term and long-term dependency information, thereby improving both detection speed and accuracy. Experimental evaluations demonstrate the significant advancements achieved by this approach, particularly when applied to the GTSDB dataset.

arxiv情報

著者	Javad Mirzapour Kaleybar,Hooman Khaloo,Avaz Naghipour
発行日	2023-11-02 17:44:32+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Efficient Vision Transformer for Accurate Traffic Sign Detection

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー