Memory-based Adapters for Online 3D Scene Perception

要約

この論文では、オンライン 3D シーン認識のための新しいフレームワークを提案します。
従来の 3D シーン認識方法はオフラインです。つまり、すでに再構築された 3D シーンジオメトリを入力として受け取ります。これは、入力データが事前に収集された RGB から再構築された完全な 3D シーンではなく、RGB-D ビデオをストリーミングするロボットアプリケーションには適用できません。
Dビデオ。
データ収集と認識を同時に実行する必要があるオンライン 3D シーン認識タスクに対処するには、モデルは 3D シーンをフレームごとに処理し、時間情報を利用できる必要があります。
この目的を達成するために、3D シーン認識モデルのバックボーンとしてアダプターベースのプラグアンドプレイモジュールを提案します。このモジュールは、抽出された RGB-D 特徴をキャッシュおよび集約するためのメモリを構築し、オフラインモデルに一時的な学習能力を与えます。
具体的には、サポートする点群と画像の特徴をキャッシュするキューメモリメカニズムを提案します。
次に、メモリ上で直接実行し、時間情報を現在のフレームに渡す集約モジュールを考案します。
さらに、強力なグローバルコンテキストで画像の特徴を強化するための 3D から 2D へのアダプターを提案します。
当社のアダプターは、さまざまなタスクの主流のオフラインアーキテクチャに簡単に挿入でき、オンラインタスクのパフォーマンスを大幅に向上させることができます。
ScanNet および SceneNN データセットに関する広範な実験により、モデルやタスク固有の設計を行わずに、既存のオフラインモデルを微調整するだけで、最先端のオンライン手法と比較して、3 つの 3D シーン認識タスクで優れたパフォーマンスを達成する当社のアプローチが実証されました。
\href{https://xuxw98.github.io/Online3D/}{プロジェクトページ}。

要約(オリジナル)

In this paper, we propose a new framework for online 3D scene perception. Conventional 3D scene perception methods are offline, i.e., take an already reconstructed 3D scene geometry as input, which is not applicable in robotic applications where the input data is streaming RGB-D videos rather than a complete 3D scene reconstructed from pre-collected RGB-D videos. To deal with online 3D scene perception tasks where data collection and perception should be performed simultaneously, the model should be able to process 3D scenes frame by frame and make use of the temporal information. To this end, we propose an adapter-based plug-and-play module for the backbone of 3D scene perception model, which constructs memory to cache and aggregate the extracted RGB-D features to empower offline models with temporal learning ability. Specifically, we propose a queued memory mechanism to cache the supporting point cloud and image features. Then we devise aggregation modules which directly perform on the memory and pass temporal information to current frame. We further propose 3D-to-2D adapter to enhance image features with strong global context. Our adapters can be easily inserted into mainstream offline architectures of different tasks and significantly boost their performance on online tasks. Extensive experiments on ScanNet and SceneNN datasets demonstrate our approach achieves leading performance on three 3D scene perception tasks compared with state-of-the-art online methods by simply finetuning existing offline models, without any model and task-specific designs. \href{https://xuxw98.github.io/Online3D/}{Project page}.

arxiv情報

著者	Xiuwei Xu,Chong Xia,Ziwei Wang,Linqing Zhao,Yueqi Duan,Jie Zhou,Jiwen Lu
発行日	2024-03-11 17:57:41+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Memory-based Adapters for Online 3D Scene Perception

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー