HALSIE: Hybrid Approach to Learning Segmentation by Simultaneously Exploiting Image and Event Modalities

要約

イベントカメラはピクセルごとの強度の変化を検出し、非同期の「イベントストリーム」を生成します。
これらは、従来のカメラと比較してはるかに高い時間解像度と高ダイナミックレンジ (HDR) を備えているため、リアルタイム自律システムでの正確なセマンティックマップ検索に大きな可能性をもたらします。
ただし、イベントベースのセグメンテーションの既存の実装では、これらの時間的に密なイベントは視覚信号の変動成分のみを測定し、フレームと比較して密な空間コンテキストをエンコードする能力が制限されるため、次善のパフォーマンスに悩まされます。
この問題に対処するために、私たちはハイブリッドエンドツーエンド学習フレームワーク HALSIE を提案します。これは 3 つの主要な概念を利用して、同等のパフォーマンスを維持しながら、従来技術と比較して推論コストを最大 $20\time$ 削減します。まず、シンプルで効率的なクロスドメインです。
フレームとイベントの両方から相補的な時空間埋め込みを抽出する学習スキーム。
2 番目に、スパイキングニューラルネットワーク (SNN) と人工ニューラルネットワーク (ANN) ブランチを備えた特別に設計されたデュアルエンコーダースキームにより、クロスドメイン機能の集約を維持しながら遅延を最小限に抑えます。
3 番目は、融合されたエンベディングの豊富な表現をモデル化するマルチスケールキューミキサーです。
HALSIE のこれらの品質により、DDD-17、MVSEC、および DSEC セマンティックデータセットで最先端のセグメンテーションパフォーマンスを実現する非常に軽量なアーキテクチャが可能になり、最大 33 倍の高いパラメータ効率と有利な推論コスト (1 あたり 17.9mJ) を実現します。
サイクル）。
私たちのアブレーション研究は、他の視覚タスク全体の研究にとって有益であることが証明できる効果的な設計の選択に関する新たな洞察ももたらします。

要約(オリジナル)

Event cameras detect changes in per-pixel intensity to generate asynchronous `event streams’. They offer great potential for accurate semantic map retrieval in real-time autonomous systems owing to their much higher temporal resolution and high dynamic range (HDR) compared to conventional cameras. However, existing implementations for event-based segmentation suffer from sub-optimal performance since these temporally dense events only measure the varying component of a visual signal, limiting their ability to encode dense spatial context compared to frames. To address this issue, we propose a hybrid end-to-end learning framework HALSIE, utilizing three key concepts to reduce inference cost by up to $20\times$ versus prior art while retaining similar performance: First, a simple and efficient cross-domain learning scheme to extract complementary spatio-temporal embeddings from both frames and events. Second, a specially designed dual-encoder scheme with Spiking Neural Network (SNN) and Artificial Neural Network (ANN) branches to minimize latency while retaining cross-domain feature aggregation. Third, a multi-scale cue mixer to model rich representations of the fused embeddings. These qualities of HALSIE allow for a very lightweight architecture achieving state-of-the-art segmentation performance on DDD-17, MVSEC, and DSEC-Semantic datasets with up to $33\times$ higher parameter efficiency and favorable inference cost (17.9mJ per cycle). Our ablation study also brings new insights into effective design choices that can prove beneficial for research across other vision tasks.

arxiv情報

著者	Shristi Das Biswas,Adarsh Kosta,Chamika Liyanagedera,Marco Apolinario,Kaushik Roy
発行日	2023-09-28 17:35:10+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

HALSIE: Hybrid Approach to Learning Segmentation by Simultaneously Exploiting Image and Event Modalities

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー