Task-Oriented Communication for Edge Video Analytics

要約

人工知能 (AI) 技術の発展とカメラ搭載デバイスの人気の高まりに伴い、多くのエッジビデオ分析アプリケーションが登場しており、ネットワークエッジでの計算集約型の AI モデルの展開が求められています。
エッジ推論は、計算集約型のワークロードをローエンドデバイスからビデオ分析用の強力なエッジサーバーに移動するための有望なソリューションですが、帯域幅が限られているため、デバイスとサーバーの通信がボトルネックのままになります。
この論文では、複数のデバイスが視覚感覚データを収集し、情報特徴を処理のためにエッジサーバーに送信する、エッジビデオ分析のためのタスク指向の通信フレームワークを提案します。
低遅延推論を可能にするために、このフレームワークは、エッジサーバーでビデオを再構成するのではなく、空間的および時間的ドメインでビデオの冗長性を削除し、下流のタスクに不可欠な最小限の情報を送信します。
具体的には、特徴の情報提供性と通信コストの間のトレードオフを特徴づける決定論的情報ボトルネック (IB) 原理に基づいて、コンパクトなタスク関連特徴を抽出します。
連続するフレームの特徴は時間的に相関しているため、特徴符号化の際に以前の特徴をサイド情報として取得することでビットレートを削減する時間エントロピーモデル (TEM) を提案します。
推論パフォーマンスをさらに向上させるために、サーバーで時空間融合モジュールを構築し、現在および以前のフレームの特徴を統合して共同推論を行います。
ビデオ分析タスクに関する広範な実験により、提案されたフレームワークがビデオデータのタスク関連情報を効果的にエンコードし、既存の方法よりも優れたレートとパフォーマンスのトレードオフを達成できることが証明されました。

要約(オリジナル)

With the development of artificial intelligence (AI) techniques and the increasing popularity of camera-equipped devices, many edge video analytics applications are emerging, calling for the deployment of computation-intensive AI models at the network edge. Edge inference is a promising solution to move the computation-intensive workloads from low-end devices to a powerful edge server for video analytics, but the device-server communications will remain a bottleneck due to the limited bandwidth. This paper proposes a task-oriented communication framework for edge video analytics, where multiple devices collect the visual sensory data and transmit the informative features to an edge server for processing. To enable low-latency inference, this framework removes video redundancy in spatial and temporal domains and transmits minimal information that is essential for the downstream task, rather than reconstructing the videos at the edge server. Specifically, it extracts compact task-relevant features based on the deterministic information bottleneck (IB) principle, which characterizes a tradeoff between the informativeness of the features and the communication cost. As the features of consecutive frames are temporally correlated, we propose a temporal entropy model (TEM) to reduce the bitrate by taking the previous features as side information in feature encoding. To further improve the inference performance, we build a spatial-temporal fusion module at the server to integrate features of the current and previous frames for joint inference. Extensive experiments on video analytics tasks evidence that the proposed framework effectively encodes task-relevant information of video data and achieves a better rate-performance tradeoff than existing methods.

arxiv情報

著者	Jiawei Shao,Xinjie Zhang,Jun Zhang
発行日	2023-09-11 11:54:14+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Task-Oriented Communication for Edge Video Analytics

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー