Rethinking Video with a Universal Event-Based Representation

要約

従来、ビデオは一連の個別の画像フレームとして構造化されていました。
しかし、最近、ビデオフレームを完全に回避する新しいビデオセンシングパラダイムが出現しました。
これらの「イベント」センサーは、各ピクセルが独立したまばらなデータストリームを持つ非同期センシングによって人間の視覚システムを模倣することを目的としています。
これらのカメラは高速かつ高ダイナミックレンジのセンシングを可能にしますが、研究者は多くの場合、既存のアプリケーションのイベントデータのフレーム化された表現に戻ったり、特定のカメラのイベントデータタイプに合わせてオーダーメイドのアプリケーションを構築したりします。
同時に、古典的なビデオシステムは、非圧縮領域のフレーム間でピクセルサンプルが繰り返されるため、アプリケーション層で大幅な計算冗長性を備えています。
既存のシステムの欠点に対処するために、新しい中間ビデオ表現およびシステムフレームワークである、Address、Decimation、{\Delta}t Event Representation (AD{\Delta}ER、「加算器」と発音) を紹介します。
このフレームワークは、さまざまなフレームカメラソースやイベントカメラソースを単一のイベントベースの表現にトランスコードし、ソースモデル化された非可逆圧縮と、従来のフレームベースのアプリケーションとの下位互換性をサポートします。
AD{\Delta}ER が、時間的冗長性の高いシーンに対して最先端のアプリケーション速度と圧縮パフォーマンスを実現することを実証します。
重要なのは、AD{\Delta}ER がコンピュータービジョンの全く新しい制御メカニズムをどのように解き放つかについて説明することです。アプリケーションの速度は、シーンのコンテンツと非可逆圧縮のレベルの両方に相関する可能性があります。
最後に、大規模なビデオ監視とリソースに制約のあるセンシングにおけるイベントベースのビデオの影響について説明します。

要約(オリジナル)

Traditionally, video is structured as a sequence of discrete image frames. Recently, however, a novel video sensing paradigm has emerged which eschews video frames entirely. These ‘event’ sensors aim to mimic the human vision system with asynchronous sensing, where each pixel has an independent, sparse data stream. While these cameras enable high-speed and high-dynamic-range sensing, researchers often revert to a framed representation of the event data for existing applications, or build bespoke applications for a particular camera’s event data type. At the same time, classical video systems have significant computational redundancy at the application layer, since pixel samples are repeated across frames in the uncompressed domain. To address the shortcomings of existing systems, I introduce Address, Decimation, {\Delta}t Event Representation (AD{\Delta}ER, pronounced ‘adder’), a novel intermediate video representation and system framework. The framework transcodes a variety of framed and event camera sources into a single event-based representation, which supports source-modeled lossy compression and backward compatibility with traditional frame-based applications. I demonstrate that AD{\Delta}ER achieves state-of-the-art application speed and compression performance for scenes with high temporal redundancy. Crucially, I describe how AD{\Delta}ER unlocks an entirely new control mechanism for computer vision: application speed can correlate with both the scene content and the level of lossy compression. Finally, I discuss the implications for event-based video on large-scale video surveillance and resource-constrained sensing.

arxiv情報

著者	Andrew Freeman
発行日	2024-08-12 16:00:17+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Rethinking Video with a Universal Event-Based Representation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー