SSTFormer: Bridging Spiking Neural Network and Memory Support Transformer for Frame-Event based Recognition

要約

イベントカメラベースのパターン認識は、近年新たに浮上した研究テーマです。
現在の研究者は通常、イベントストリームを画像、グラフ、またはボクセルに変換し、イベントベースの分類にディープニューラルネットワークを採用しています。
ただし、単純なイベント認識データセットでは良好なパフォーマンスを達成できますが、次の 2 つの問題により、その結果は依然として制限される可能性があります。
まず、空間的にまばらなイベントストリームを認識のみに採用しているため、色や詳細なテクスチャ情報をうまくキャプチャできない可能性があります。
第 2 に、最適な結果が得られないエネルギー効率の高い認識にはスパイキングニューラルネットワーク (SNN) を採用するか、エネルギーを大量に消費する高性能の認識には人工ニューラルネットワーク (ANN) を採用します。
しかし、これら 2 つの側面のバランスを考慮する人はほとんどいません。
この論文では、RGB フレームとイベントストリームを同時に融合することでパターンを認識することを正式に提案し、前述の問題に対処するための新しい RGB フレームイベント認識フレームワークを提案します。
提案された方法には、RGB フレームエンコーディング用のメモリサポートトランスフォーマーネットワーク、生のイベントストリームエンコーディング用のスパイクニューラルネットワーク、RGB イベント特徴集約用のマルチモーダルボトルネックフュージョンモジュール、および予測ヘッドの 4 つの主要なモジュールが含まれています。
RGB イベントベースの分類データセットが不足しているため、114 のクラスと、DVS346 イベントカメラを使用して記録された 27102 のフレームイベントペアを含む大規模な PokerEvent データセットも提案します。
2 つの RGB イベントベースの分類データセットに対する広範な実験により、提案したフレームワークの有効性が完全に検証されました。
私たちは、この研究が RGB フレームとイベントストリームを融合することによってパターン認識の開発を促進することを願っています。
この作業のデータセットとソースコードは両方とも https://github.com/Event-AHU/SSTFormer でリリースされます。

要約(オリジナル)

Event camera-based pattern recognition is a newly arising research topic in recent years. Current researchers usually transform the event streams into images, graphs, or voxels, and adopt deep neural networks for event-based classification. Although good performance can be achieved on simple event recognition datasets, however, their results may be still limited due to the following two issues. Firstly, they adopt spatial sparse event streams for recognition only, which may fail to capture the color and detailed texture information well. Secondly, they adopt either Spiking Neural Networks (SNN) for energy-efficient recognition with suboptimal results, or Artificial Neural Networks (ANN) for energy-intensive, high-performance recognition. However, seldom of them consider achieving a balance between these two aspects. In this paper, we formally propose to recognize patterns by fusing RGB frames and event streams simultaneously and propose a new RGB frame-event recognition framework to address the aforementioned issues. The proposed method contains four main modules, i.e., memory support Transformer network for RGB frame encoding, spiking neural network for raw event stream encoding, multi-modal bottleneck fusion module for RGB-Event feature aggregation, and prediction head. Due to the scarce of RGB-Event based classification dataset, we also propose a large-scale PokerEvent dataset which contains 114 classes, and 27102 frame-event pairs recorded using a DVS346 event camera. Extensive experiments on two RGB-Event based classification datasets fully validated the effectiveness of our proposed framework. We hope this work will boost the development of pattern recognition by fusing RGB frames and event streams. Both our dataset and source code of this work will be released at https://github.com/Event-AHU/SSTFormer.

arxiv情報

著者	Xiao Wang,Zongzhen Wu,Yao Rong,Lin Zhu,Bo Jiang,Jin Tang,Yonghong Tian
発行日	2023-08-08 16:15:35+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

SSTFormer: Bridging Spiking Neural Network and Memory Support Transformer for Frame-Event based Recognition

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー