Segment Any Point Cloud Sequences by Distilling Vision Foundation Models

要約

視覚基盤モデル (VFM) の最近の進歩により、多用途かつ効率的な視覚認識の新たな可能性が開かれました。
この研究では、自動車のさまざまな点群シーケンスをセグメント化するために VFM を利用する新しいフレームワークである Seal を紹介します。
Seal は 3 つの魅力的な特性を示します。 i) スケーラビリティ: VFM は直接点群に抽出されるため、事前トレーニング中に 2D または 3D での注釈の必要がなくなります。
ii) 一貫性: 空間的および時間的な関係は、カメラから LiDAR への正則化段階とポイントからセグメントへの正則化段階の両方で強制され、クロスモーダル表現の学習が容易になります。
iii) 一般化可能性: Seal を使用すると、実際/合成、低解像度/高解像度、大規模/小規模、クリーン/破損したデータセットからの点群など、さまざまな点群を含む下流タスクに既製の方法で知識を伝達できます。
11 の異なる点群データセットに対して行われた広範な実験により、Seal の有効性と優位性が実証されました。
特に、Seal は線形プローブ後のニューシーンで 45.0% mIoU という驚くべきパフォーマンスを達成し、ランダム初期化を 36.9% mIoU 上回り、従来技術を 6.1% mIoU 上回りました。
さらに、Seal は、テストされた 11 個の点群データセットすべてに対する 20 の異なる数ショット微調整タスクにわたって、既存の方法と比較して大幅なパフォーマンスの向上を示しています。

要約(オリジナル)

Recent advancements in vision foundation models (VFMs) have opened up new possibilities for versatile and efficient visual perception. In this work, we introduce Seal, a novel framework that harnesses VFMs for segmenting diverse automotive point cloud sequences. Seal exhibits three appealing properties: i) Scalability: VFMs are directly distilled into point clouds, obviating the need for annotations in either 2D or 3D during pretraining. ii) Consistency: Spatial and temporal relationships are enforced at both the camera-to-LiDAR and point-to-segment regularization stages, facilitating cross-modal representation learning. iii) Generalizability: Seal enables knowledge transfer in an off-the-shelf manner to downstream tasks involving diverse point clouds, including those from real/synthetic, low/high-resolution, large/small-scale, and clean/corrupted datasets. Extensive experiments conducted on eleven different point cloud datasets showcase the effectiveness and superiority of Seal. Notably, Seal achieves a remarkable 45.0% mIoU on nuScenes after linear probing, surpassing random initialization by 36.9% mIoU and outperforming prior arts by 6.1% mIoU. Moreover, Seal demonstrates significant performance gains over existing methods across 20 different few-shot fine-tuning tasks on all eleven tested point cloud datasets.

arxiv情報

著者	Youquan Liu,Lingdong Kong,Jun Cen,Runnan Chen,Wenwei Zhang,Liang Pan,Kai Chen,Ziwei Liu
発行日	2023-10-24 09:51:00+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Segment Any Point Cloud Sequences by Distilling Vision Foundation Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー