EvPlug: Learn a Plug-and-Play Module for Event and Image Fusion

要約

イベントカメラと RGB カメラは、イメージングにおいて相補的な特性を示します。前者は高ダイナミックレンジ (HDR) と高い時間解像度を備え、後者は豊富なテクスチャと色の情報を提供します。
これにより、イベントカメラを中レベルおよび高レベルの RGB ベースのビジョンタスクに統合することが非常に有望になります。
ただし、マルチモーダルフュージョン、データアノテーション、モデルアーキテクチャの設計では課題が生じます。
この論文では、既存の RGB ベースのモデルの監視からプラグアンドプレイイベントと画像融合モジュールを学習する EvPlug を提案します。
学習されたフュージョンモジュールは、プラグインの形式でイベントストリームと画像特徴を統合し、RGB ベースのモデルに HDR や高速モーションシーンに対する堅牢性を与え、同時に高い時間解像度の推論を可能にします。
私たちの方法では、ラベルのないイベントと画像のペアのみが必要であり (ピクセル単位の位置合わせは必要ありません)、RGB ベースのモデルの構造や重みは変更されません。
物体検出、セマンティックセグメンテーション、3D 手の姿勢推定などのいくつかの視覚タスクにおける EvPlug の優位性を実証します。

要約(オリジナル)

Event cameras and RGB cameras exhibit complementary characteristics in imaging: the former possesses high dynamic range (HDR) and high temporal resolution, while the latter provides rich texture and color information. This makes the integration of event cameras into middle- and high-level RGB-based vision tasks highly promising. However, challenges arise in multi-modal fusion, data annotation, and model architecture design. In this paper, we propose EvPlug, which learns a plug-and-play event and image fusion module from the supervision of the existing RGB-based model. The learned fusion module integrates event streams with image features in the form of a plug-in, endowing the RGB-based model to be robust to HDR and fast motion scenes while enabling high temporal resolution inference. Our method only requires unlabeled event-image pairs (no pixel-wise alignment required) and does not alter the structure or weights of the RGB-based model. We demonstrate the superiority of EvPlug in several vision tasks such as object detection, semantic segmentation, and 3D hand pose estimation

arxiv情報

著者	Jianping Jiang,Xinyu Zhou,Peiqi Duan,Boxin Shi
発行日	2023-12-28 10:05:13+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

EvPlug: Learn a Plug-and-Play Module for Event and Image Fusion

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー