ZISVFM: Zero-Shot Object Instance Segmentation in Indoor Robotic Environments with Vision Foundation Models

要約

構造化されていない環境で動作するサービスロボットは、機能を強化するために不明なオブジェクトを効果的に認識し、セグメント化する必要があります。
従来の監視されている学習ベースのセグメンテーション手法には、現実世界のシナリオで遭遇するオブジェクトの多様性にとっては非現実的な注釈付きデータセットが必要です。
Unseen Object Instanceセグメンテーション（UOIS）メソッドは、合成データのトレーニングモデルで新しいオブジェクトに一般化することにより、これに対処することを目的としていますが、シミュレーション間のギャップに苦しむことがよくあります。
このホワイトペーパーでは、セグメントAnything Model（SAM）の強力なゼロショット機能を活用することにより、UOIを解くための新しいアプローチ（ZisVFM）を提案し、自己補助ビジョントランス（VIT）からの明示的な視覚表現を提案しています。
提案されたフレームワークは、3つの段階で動作します。（1）SAMを使用した色付き深度画像からオブジェクトに依存しないマスク提案を生成し、（2）非オブジェクトマスクをフィルタリングするための自己補助VITの注意ベースの機能を使用してこれらの提案を改良する、（3）
K-Medoidsクラスタリングを適用して、SAMを正確なオブジェクトセグメンテーションに導くポイントプロンプトを生成します。
2つのベンチマークデータセットと自己収集データセットでの実験的検証は、キャビネット、引き出し、ハンドヘルドオブジェクトなどの階層設定など、複雑な環境でZISVFMの優れた性能を示しています。
ソースコードは、https：//github.com/yinmlmaoliang/zisvfmで入手できます。

要約(オリジナル)

Service robots operating in unstructured environments must effectively recognize and segment unknown objects to enhance their functionality. Traditional supervised learningbased segmentation techniques require extensive annotated datasets, which are impractical for the diversity of objects encountered in real-world scenarios. Unseen Object Instance Segmentation (UOIS) methods aim to address this by training models on synthetic data to generalize to novel objects, but they often suffer from the simulation-to-reality gap. This paper proposes a novel approach (ZISVFM) for solving UOIS by leveraging the powerful zero-shot capability of the segment anything model (SAM) and explicit visual representations from a selfsupervised vision transformer (ViT). The proposed framework operates in three stages: (1) generating object-agnostic mask proposals from colorized depth images using SAM, (2) refining these proposals using attention-based features from the selfsupervised ViT to filter non-object masks, and (3) applying K-Medoids clustering to generate point prompts that guide SAM towards precise object segmentation. Experimental validation on two benchmark datasets and a self-collected dataset demonstrates the superior performance of ZISVFM in complex environments, including hierarchical settings such as cabinets, drawers, and handheld objects. Our source code is available at https://github.com/Yinmlmaoliang/zisvfm.

arxiv情報

著者	Ying Zhang,Maoliang Yin,Wenfu Bi,Haibao Yan,Shaohan Bian,Cui-Hua Zhang,Changchun Hua
発行日	2025-02-05 15:22:20+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

ZISVFM: Zero-Shot Object Instance Segmentation in Indoor Robotic Environments with Vision Foundation Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー