Masked Event Modeling: Self-Supervised Pretraining for Event Cameras

要約

イベントカメラは、低遅延、高時間解像度、高ダイナミックレンジで明るさの変化を非同期的にキャプチャする機能を提供します。
これらのセンサーに分類やその他のタスクのための深層学習手法を導入するには、通常、大規模なラベル付きデータセットが必要です。
ラベル付きイベントデータの量は、ラベル付き RGB 画像の大部分に比べて小さいため、イベントベースのビジョンの進歩は依然として限られています。
ラベル付きイベントデータへの依存を減らすために、イベントの自己教師あり事前トレーニングフレームワークであるマスクイベントモデリング (MEM) を導入します。
私たちの方法では、任意のイベントカメラ記録から発生するラベルのないイベントに関してニューラルネットワークを事前学習します。
その後、事前トレーニングされたモデルが下流のタスクで微調整され、必要なラベルの数が減りながら全体的なパフォーマンスが向上します。
私たちの手法は、N-ImageNet、N-Cars、N-Caltech101 での最先端の手法を上回り、N-ImageNet での物体分類精度が 7.96% 向上しました。
マスクされたイベントモデリングが、実世界のデータセットでの RGB ベースの事前トレーニングよりも優れていることを実証します。

要約(オリジナル)

Event cameras offer the capacity to asynchronously capture brightness changes with low latency, high temporal resolution, and high dynamic range. Deploying deep learning methods for classification or other tasks to these sensors typically requires large labeled datasets. Since the amount of labeled event data is tiny compared to the bulk of labeled RGB imagery, the progress of event-based vision has remained limited. To reduce the dependency on labeled event data, we introduce Masked Event Modeling (MEM), a self-supervised pretraining framework for events. Our method pretrains a neural network on unlabeled events, which can originate from any event camera recording. Subsequently, the pretrained model is finetuned on a downstream task leading to an overall better performance while requiring fewer labels. Our method outperforms the state-of-the-art on N-ImageNet, N-Cars, and N-Caltech101, increasing the object classification accuracy on N-ImageNet by 7.96%. We demonstrate that Masked Event Modeling is superior to RGB-based pretraining on a real world dataset.

arxiv情報

著者	Simon Klenk,David Bonello,Lukas Koestler,Nikita Araslanov,Daniel Cremers
発行日	2023-11-15 18:08:57+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Masked Event Modeling: Self-Supervised Pretraining for Event Cameras

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー