S3PT: Scene Semantics and Structure Guided Clustering to Boost Self-Supervised Pre-Training for Autonomous Driving

要約

DINO や Cribo などの最近の自己教師ありクラスタリングベースの事前トレーニング技術は、下流の検出およびセグメンテーションタスクで優れた結果を示しています。
しかし、自動運転などの現実世界のアプリケーションは、不均衡なオブジェクトクラスとサイズの分布、および複雑なシーンのジオメトリという課題に直面しています。
この論文では、自己教師ありトレーニングによりシーンに一貫した目標を提供するための、新しいシーンセマンティクスと構造誘導クラスタリングである S3PT を提案します。
具体的には、私たちの貢献は 3 つあります。まず、意味論的分布に一貫したクラスタリングを組み込んで、オートバイや動物などの希少なクラスのより適切な表現を促進します。
次に、オブジェクトの多様性に一貫した空間クラスタリングを導入し、大きな背景領域から歩行者や交通標識などの小さなオブジェクトに至るまで、不均衡で多様なオブジェクトサイズを処理します。
第三に、シーンの幾何学的情報に基づいて学習を正規化し、特徴レベルでの領域分離をさらに洗練するために、深度ガイド付き空間クラスタリングを提案します。
私たちが学習した表現は、nuScenes、nuImages、Cityscapes データセットでの下流のセマンティックセグメンテーションと 3D オブジェクト検出タスクのパフォーマンスを大幅に向上させ、有望なドメイン変換プロパティを示します。

要約(オリジナル)

Recent self-supervised clustering-based pre-training techniques like DINO and Cribo have shown impressive results for downstream detection and segmentation tasks. However, real-world applications such as autonomous driving face challenges with imbalanced object class and size distributions and complex scene geometries. In this paper, we propose S3PT a novel scene semantics and structure guided clustering to provide more scene-consistent objectives for self-supervised training. Specifically, our contributions are threefold: First, we incorporate semantic distribution consistent clustering to encourage better representation of rare classes such as motorcycles or animals. Second, we introduce object diversity consistent spatial clustering, to handle imbalanced and diverse object sizes, ranging from large background areas to small objects such as pedestrians and traffic signs. Third, we propose a depth-guided spatial clustering to regularize learning based on geometric information of the scene, thus further refining region separation on the feature level. Our learned representations significantly improve performance in downstream semantic segmentation and 3D object detection tasks on the nuScenes, nuImages, and Cityscapes datasets and show promising domain translation properties.

arxiv情報

著者	Maciej K. Wozniak,Hariprasath Govindarajan,Marvin Klingner,Camille Maurice,Ravi Kiran,Senthil Yogamani
発行日	2024-10-30 15:00:06+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

S3PT: Scene Semantics and Structure Guided Clustering to Boost Self-Supervised Pre-Training for Autonomous Driving

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー