Source-Free Domain Adaptation for RGB-D Semantic Segmentation with Vision Transformers

要約

深度センサーの利用可能性が高まるにつれ、色情報と深度データを組み合わせるマルチモーダルフレームワークへの関心が高まっています。
セマンティックセグメンテーションという困難なタスクにおいて、深度マップを使用すると、異なる深度にある同様の色のオブジェクトを区別し、有用な幾何学的手がかりを提供できます。
一方で、セマンティックセグメンテーションのグラウンドトゥルースデータを提供するのは面倒なため、ドメイン適応も重要な研究分野です。
具体的には、ソースデータを再利用せずに適応を実行する、困難なソースフリードメイン適応設定に取り組みます。
私たちは、MISFIT: MultImodal Source-Free Information fusion Transformer を提案します。これは、複数の段階、つまり入力レベル、特徴レベル、出力レベルでビジョントランスフォーマーに基づくセグメンテーションモジュールに深度情報を注入する深度認識フレームワークです。
カラーと深度スタイルの転送は、初期段階のドメイン調整に役立ちますが、モダリティ間のセルフアテンションの再配線により混合機能が作成され、より適切なセマンティックコンテンツの抽出が可能になります。
さらに、異なる距離にある領域に適応的に重み付けを行う、深さに基づくエントロピー最小化戦略も提案されています。
私たちのフレームワークは、ソースフリーのセマンティックセグメンテーションにビジョントランスフォーマーを使用した最初のアプローチでもあり、標準戦略と比較して顕著なパフォーマンスの向上を示しています。

要約(オリジナル)

With the increasing availability of depth sensors, multimodal frameworks that combine color information with depth data are attracting increasing interest. In the challenging task of semantic segmentation, depth maps allow to distinguish between similarly colored objects at different depths and provide useful geometric cues. On the other side, ground truth data for semantic segmentation is burdensome to be provided and thus domain adaptation is another significant research area. Specifically, we address the challenging source-free domain adaptation setting where the adaptation is performed without reusing source data. We propose MISFIT: MultImodal Source-Free Information fusion Transformer, a depth-aware framework which injects depth information into a segmentation module based on vision transformers at multiple stages, namely at the input, feature and output levels. Color and depth style transfer helps early-stage domain alignment while re-wiring self-attention between modalities creates mixed features allowing the extraction of better semantic content. Furthermore, a depth-based entropy minimization strategy is also proposed to adaptively weight regions at different distances. Our framework, which is also the first approach using vision transformers for source-free semantic segmentation, shows noticeable performance improvements with respect to standard strategies.

arxiv情報

著者	Giulia Rizzoli,Donald Shenaj,Pietro Zanuttigh
発行日	2023-05-23 17:20:47+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Source-Free Domain Adaptation for RGB-D Semantic Segmentation with Vision Transformers

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー