Masked Event Modeling: Self-Supervised Pretraining for Event Cameras

要約

イベントカメラは、低遅延、高時間分解能、および高ダイナミックレンジで、明るさの変化を非同期的にキャプチャする機能を提供します。
分類やその他のタスクのための深層学習手法をこれらのセンサーに展開するには、通常、大規模なラベル付きデータセットが必要です。
ラベル付けされたイベントデータの量は、ラベル付けされた RGB 画像の大部分に比べてごくわずかであるため、イベントベースのビジョンの進歩は限定的なままです。
ラベル付きイベントデータへの依存を減らすために、イベントの自己教師あり事前トレーニングフレームワークであるマスクイベントモデリング (MEM) を導入します。
私たちの方法は、ラベルのないイベントでニューラルネットワークを事前トレーニングします。これは、イベントカメラの記録から発生する可能性があります。
その後、事前トレーニング済みのモデルがダウンストリームタスクで微調整され、必要なラベルが少なくなり、全体的なパフォーマンスが向上します。
私たちの方法は、N-ImageNet、N-Cars、および N-Caltech101 で最新技術を上回り、N-ImageNet でのオブジェクト分類精度を 7.96% 向上させます。
マスクイベントモデリングが、現実世界のデータセットでの RGB ベースの事前トレーニングよりも優れていることを示します。

要約(オリジナル)

Event cameras offer the capacity to asynchronously capture brightness changes with low latency, high temporal resolution, and high dynamic range. Deploying deep learning methods for classification or other tasks to these sensors typically requires large labeled datasets. Since the amount of labeled event data is tiny compared to the bulk of labeled RGB imagery, the progress of event-based vision has remained limited. To reduce the dependency on labeled event data, we introduce Masked Event Modeling (MEM), a self-supervised pretraining framework for events. Our method pretrains a neural network on unlabeled events, which can originate from any event camera recording. Subsequently, the pretrained model is finetuned on a downstream task leading to an overall better performance while requiring fewer labels. Our method outperforms the state-of-the-art on N-ImageNet, N-Cars, and N-Caltech101, increasing the object classification accuracy on N-ImageNet by 7.96%. We demonstrate that Masked Event Modeling is superior to RGB-based pretraining on a real world dataset.

arxiv情報

著者	Simon Klenk,David Bonello,Lukas Koestler,Daniel Cremers
発行日	2022-12-20 15:49:56+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Masked Event Modeling: Self-Supervised Pretraining for Event Cameras

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー