Panoptic Vision-Language Feature Fields

要約

最近、3D オープンボキャブラリーのセマンティックセグメンテーションのための方法が提案されています。
このようなメソッドは、テキストの説明を使用して、実行時にシーンを任意のクラスに分割できます。
この論文では、私たちの知る限り、セマンティックセグメンテーションとインスタンスセグメンテーションの両方を同時に実行する、オープン語彙パノプティックセグメンテーションのための最初のアルゴリズムを提案します。
私たちのアルゴリズムであるパノプティック視覚言語特徴フィールド (PVLFF) は、シーンの特徴フィールドを学習し、入力フレーム上の 2D インスタンスセグメント提案からの対比損失関数を通じて視覚言語特徴と階層インスタンス特徴を共同学習します。
私たちの手法は、HyperSim、ScanNet、Replica データセット上の最先端のクローズセット 3D パノプティックシステムに対して同等のパフォーマンスを達成し、セマンティックセグメンテーションの点で現在の 3D オープン語彙システムを上回ります。
さらに、モデルアーキテクチャの有効性を実証するためにメソッドをアブレーションします。
私たちのコードは https://github.com/ethz-asl/autolabel で入手できます。

要約(オリジナル)

Recently, methods have been proposed for 3D open-vocabulary semantic segmentation. Such methods are able to segment scenes into arbitrary classes given at run-time using their text description. In this paper, we propose to our knowledge the first algorithm for open-vocabulary panoptic segmentation, simultaneously performing both semantic and instance segmentation. Our algorithm, Panoptic Vision-Language Feature Fields (PVLFF) learns a feature field of the scene, jointly learning vision-language features and hierarchical instance features through a contrastive loss function from 2D instance segment proposals on input frames. Our method achieves comparable performance against the state-of-the-art close-set 3D panoptic systems on the HyperSim, ScanNet and Replica dataset and outperforms current 3D open-vocabulary systems in terms of semantic segmentation. We additionally ablate our method to demonstrate the effectiveness of our model architecture. Our code will be available at https://github.com/ethz-asl/autolabel.

arxiv情報

著者	Haoran Chen,Kenneth Blomqvist,Francesco Milano,Roland Siegwart
発行日	2023-09-11 13:41:27+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Panoptic Vision-Language Feature Fields

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー