Open-vocabulary object 6D pose estimation

要約

オープンボキャブラリーオブジェクト 6D 姿勢推定の新しい設定を導入します。この設定では、テキストプロンプトを使用して対象オブジェクトを指定します。
既存のアプローチとは対照的に、私たちの設定では、(i) 対象のオブジェクトはテキストのプロンプトのみを通じて指定され、(ii) 推論時にオブジェクトモデル (CAD やビデオシーケンスなど) は必要ありません。(iii) オブジェクトは
異なるシーンの 2 つの RGBD 視点から画像化されます。
この設定で操作するために、視覚言語モデルを活用してシーンから対象オブジェクトをセグメント化し、その相対的な 6D 姿勢を推定する新しいアプローチを導入します。
私たちのアプローチの鍵は、プロンプトによって提供されるオブジェクトレベルの情報とローカル画像特徴を融合し、新しい概念に一般化できる特徴空間をもたらす、慎重に考案された戦略です。
REAL275 と Toyota-Light という 2 つの人気のあるデータセットに基づく新しいベンチマークでアプローチを検証します。これらのデータセットには、4,000 の画像ペアに現れる 34 のオブジェクトインスタンスがまとめて含まれています。
この結果は、さまざまなシーンにおけるオブジェクトの相対的な 6D 姿勢の推定において、私たちのアプローチが、確立された手作りの方法と最近の深層学習ベースのベースラインの両方を上回るパフォーマンスを示していることを示しています。
コードとデータセットは https://jcorsetti.github.io/oryon で入手できます。

要約(オリジナル)

We introduce the new setting of open-vocabulary object 6D pose estimation, in which a textual prompt is used to specify the object of interest. In contrast to existing approaches, in our setting (i) the object of interest is specified solely through the textual prompt, (ii) no object model (e.g., CAD or video sequence) is required at inference, and (iii) the object is imaged from two RGBD viewpoints of different scenes. To operate in this setting, we introduce a novel approach that leverages a Vision-Language Model to segment the object of interest from the scenes and to estimate its relative 6D pose. The key of our approach is a carefully devised strategy to fuse object-level information provided by the prompt with local image features, resulting in a feature space that can generalize to novel concepts. We validate our approach on a new benchmark based on two popular datasets, REAL275 and Toyota-Light, which collectively encompass 34 object instances appearing in four thousand image pairs. The results demonstrate that our approach outperforms both a well-established hand-crafted method and a recent deep learning-based baseline in estimating the relative 6D pose of objects in different scenes. Code and dataset are available at https://jcorsetti.github.io/oryon.

arxiv情報

著者	Jaime Corsetti,Davide Boscaini,Changjae Oh,Andrea Cavallaro,Fabio Poiesi
発行日	2024-04-05 14:44:27+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Open-vocabulary object 6D pose estimation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー