Drive&Segment: Unsupervised Semantic Segmentation of Urban Scenes via Cross-modal Distillation

要約

この研究では、カメラと LiDAR センサーを搭載して街中を走行する車によって収集された未加工の未加工データのみから、手動による注釈を一切使用せずに、都市シーンにおけるピクセル単位のセマンティック画像セグメンテーションの学習を調査しています。
私たちの貢献は 3 つあります。
まず、同期した LiDAR と画像データを活用して、セマンティック画像セグメンテーションのクロスモーダル教師なし学習のための新しい方法を提案します。
私たちの方法の重要な要素は、LiDAR 点群を分析して空間的に一貫したオブジェクトの提案を取得するオブジェクト提案モジュールの使用です。
次に、これらの 3D オブジェクトの提案を入力画像と合わせて、意味的に意味のある擬似クラスに確実にクラスタリングできることを示します。
最後に、結果の擬似クラスで部分的に注釈が付けられた画像データを活用して、画像セマンティックセグメンテーション用のトランスフォーマーベースのモデルをトレーニングするクロスモーダル蒸留アプローチを開発します。
微調整を行わずに 4 つの異なるテストデータセット (都市景観、ダークチューリッヒ、夜間運転、ACDC) でテストすることにより、この方法の一般化機能を示し、この問題に関して現在の最先端技術と比較して大幅な改善が見られることを示します。
コードなどについては、プロジェクトの Web ページ https://vobecant.github.io/DriveAndSegment/ を参照してください。

要約(オリジナル)

This work investigates learning pixel-wise semantic image segmentation in urban scenes without any manual annotation, just from the raw non-curated data collected by cars which, equipped with cameras and LiDAR sensors, drive around a city. Our contributions are threefold. First, we propose a novel method for cross-modal unsupervised learning of semantic image segmentation by leveraging synchronized LiDAR and image data. The key ingredient of our method is the use of an object proposal module that analyzes the LiDAR point cloud to obtain proposals for spatially consistent objects. Second, we show that these 3D object proposals can be aligned with the input images and reliably clustered into semantically meaningful pseudo-classes. Finally, we develop a cross-modal distillation approach that leverages image data partially annotated with the resulting pseudo-classes to train a transformer-based model for image semantic segmentation. We show the generalization capabilities of our method by testing on four different testing datasets (Cityscapes, Dark Zurich, Nighttime Driving and ACDC) without any finetuning, and demonstrate significant improvements compared to the current state of the art on this problem. See project webpage https://vobecant.github.io/DriveAndSegment/ for the code and more.

arxiv情報

著者	Antonin Vobecky,David Hurych,Oriane Siméoni,Spyros Gidaris,Andrei Bursuc,Patrick Pérez,Josef Sivic
発行日	2024-02-21 16:25:12+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Drive&Segment: Unsupervised Semantic Segmentation of Urban Scenes via Cross-modal Distillation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー