FILP-3D: Enhancing 3D Few-shot Class-incremental Learning with Pre-trained Vision-Language Models

要約

少数ショットクラス増分学習 (FSCIL) は、限られたデータでモデルが増分トレーニングされる場合の致命的な忘却の問題を軽減することを目的としています。
Contrastive Vision-Language Pre-Training (CLIP) モデルは、2D の少数/ゼロショット学習タスクに対処するのには効果的ですが、3D FSCIL への直接適用には限界があります。
これらの制限は、現実世界でスキャンされた 3D データのフィーチャ空間の不整合と重大なノイズによって発生します。
これらの課題に対処するために、冗長機能除去装置 (RFE) と空間ノイズ補償装置 (SNC) という 2 つの新しいコンポーネントを導入しました。
RFE は、事前トレーニング済みモデル (PTM) の特徴空間に対して独自の次元削減を実行することで、入力点群とその埋め込みの特徴空間を調整し、セマンティックな整合性を損なうことなく冗長な情報を効果的に削除します。
一方、SNC は、点群内の堅牢な幾何学的情報をキャプチャするように設計されたグラフベースの 3D モデルで、特に実世界のスキャンデータを処理する場合に、投影によって失われた知識を補強します。
既存の 3D データセットの不均衡を考慮して、3D FSCIL モデルのより微妙な評価を提供する新しい評価指標も提案します。
従来の精度指標には偏りがあることが証明されています。
したがって、私たちのメトリックは、古いクラスと新しいクラスのバランスを維持しながら新しいクラスを学習するモデルの習熟度に焦点を当てています。
確立された 3D FSCIL ベンチマークと当社のデータセットの両方に関する実験結果は、当社のアプローチが既存の最先端の手法を大幅に上回ることを示しています。

要約(オリジナル)

Few-shot class-incremental learning (FSCIL) aims to mitigate the catastrophic forgetting issue when a model is incrementally trained on limited data. While the Contrastive Vision-Language Pre-Training (CLIP) model has been effective in addressing 2D few/zero-shot learning tasks, its direct application to 3D FSCIL faces limitations. These limitations arise from feature space misalignment and significant noise in real-world scanned 3D data. To address these challenges, we introduce two novel components: the Redundant Feature Eliminator (RFE) and the Spatial Noise Compensator (SNC). RFE aligns the feature spaces of input point clouds and their embeddings by performing a unique dimensionality reduction on the feature space of pre-trained models (PTMs), effectively eliminating redundant information without compromising semantic integrity. On the other hand, SNC is a graph-based 3D model designed to capture robust geometric information within point clouds, thereby augmenting the knowledge lost due to projection, particularly when processing real-world scanned data. Considering the imbalance in existing 3D datasets, we also propose new evaluation metrics that offer a more nuanced assessment of a 3D FSCIL model. Traditional accuracy metrics are proved to be biased; thus, our metrics focus on the model’s proficiency in learning new classes while maintaining the balance between old and new classes. Experimental results on both established 3D FSCIL benchmarks and our dataset demonstrate that our approach significantly outperforms existing state-of-the-art methods.

arxiv情報

著者	Wan Xu,Tianyu Huang,Tianyu Qu,Guanglei Yang,Yiwen Guo,Wangmeng Zuo
発行日	2023-12-28 14:52:07+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

FILP-3D: Enhancing 3D Few-shot Class-incremental Learning with Pre-trained Vision-Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー