DFormer: Rethinking RGBD Representation Learning for Semantic Segmentation

要約

RGB-D セグメンテーションタスクの転送可能な表現を学習するための新しい RGB-D 事前トレーニングフレームワークである DFormer を紹介します。
DFormer には 2 つの新しい重要な革新があります。1) RGB 特徴をエンコードすることを目的とした以前の作品とは異なり、DFormer は一連の RGB-D ブロックで構成されており、新しいビルディングブロック設計を通じて RGB と深度情報の両方をエンコードするように調整されています。
2) ImageNet-1K からの画像深度ペアを使用してバックボーンを事前トレーニングするため、DFormer には RGB-D 表現をエンコードする能力が与えられます。
これにより、RGB 事前トレーニングバックボーンによる深度マップの 3D ジオメトリ関係の不一致エンコードが回避されます。この不一致は既存の方法に広く存在しますが、解決されていません。
軽量のデコーダヘッドを使用して、2 つの一般的な RGB-D タスク、つまり RGB-D セマンティックセグメンテーションと RGB-D 顕著オブジェクト検出に関して事前トレーニングされた DFormer を微調整します。
実験結果は、当社の DFormer が、2 つの RGB-D セグメンテーションデータセットと 5 つの RGB-D 顕著性データセットに対する現在の最良の方法の半分未満の計算コストで、これら 2 つのタスクで新しい最先端のパフォーマンスを達成することを示しています。
私たちのコードは https://github.com/VCIP-RGBD/DFormer で入手できます。

要約(オリジナル)

We present DFormer, a novel RGB-D pretraining framework to learn transferable representations for RGB-D segmentation tasks. DFormer has two new key innovations: 1) Unlike previous works that aim to encode RGB features,DFormer comprises a sequence of RGB-D blocks, which are tailored for encoding both RGB and depth information through a novel building block design; 2) We pre-train the backbone using image-depth pairs from ImageNet-1K, and thus the DFormer is endowed with the capacity to encode RGB-D representations. It avoids the mismatched encoding of the 3D geometry relationships in depth maps by RGB pre-trained backbones, which widely lies in existing methods but has not been resolved. We fine-tune the pre-trained DFormer on two popular RGB-D tasks, i.e., RGB-D semantic segmentation and RGB-D salient object detection, with a lightweight decoder head. Experimental results show that our DFormer achieves new state-of-the-art performance on these two tasks with less than half of the computational cost of the current best methods on two RGB-D segmentation datasets and five RGB-D saliency datasets. Our code is available at: https://github.com/VCIP-RGBD/DFormer.

arxiv情報

著者	Bowen Yin,Xuying Zhang,Zhongyu Li,Li Liu,Ming-Ming Cheng,Qibin Hou
発行日	2023-09-18 11:09:11+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

DFormer: Rethinking RGBD Representation Learning for Semantic Segmentation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー