Monocular Depth Estimation and Segmentation for Transparent Object with Iterative Semantic and Geometric Fusion

要約

透明なオブジェクトの知覚は、多数のロボットタスクに不可欠です。
ただし、透明性オブジェクトの深さを正確にセグメント化および推定すると、複雑な光学特性があるため、困難なままです。
既存の方法は、主に追加の入力または特殊なセンサーを使用して1つのタスクのみを掘り下げ、タスク間の貴重な相互作用とその後の改良プロセスを無視し、最適ではないぼやけた予測につながります。
これらの問題に対処するために、単眼のフレームワークを提案します。これは、単一イメージの入力のみを使用して、透明オブジェクトのセグメンテーションと深さ推定の両方で最初に優れたものです。
具体的には、タスク間のマルチスケール情報を効果的に統合して、新しいセマンティックおよび幾何学的融合モジュールを考案します。
さらに、オブジェクトの人間の認識からインスピレーションを得て、より明確な結果を得るために初期の機能を徐々に改良する反復戦略をさらに組み込みます。
2つの挑戦的な合成および現実世界のデータセットでの実験は、私たちのモデルが最先端のモノクラー、ステレオ、マルチビューの方法を、単一のRGB入力のみで約38.8％-46.2％の大きなマージンで超えていることを示しています。
コードとモデルは、https://github.com/l-jyuan/modestで公開されています。

要約(オリジナル)

Transparent object perception is indispensable for numerous robotic tasks. However, accurately segmenting and estimating the depth of transparent objects remain challenging due to complex optical properties. Existing methods primarily delve into only one task using extra inputs or specialized sensors, neglecting the valuable interactions among tasks and the subsequent refinement process, leading to suboptimal and blurry predictions. To address these issues, we propose a monocular framework, which is the first to excel in both segmentation and depth estimation of transparent objects, with only a single-image input. Specifically, we devise a novel semantic and geometric fusion module, effectively integrating the multi-scale information between tasks. In addition, drawing inspiration from human perception of objects, we further incorporate an iterative strategy, which progressively refines initial features for clearer results. Experiments on two challenging synthetic and real-world datasets demonstrate that our model surpasses state-of-the-art monocular, stereo, and multi-view methods by a large margin of about 38.8%-46.2% with only a single RGB input. Codes and models are publicly available at https://github.com/L-J-Yuan/MODEST.

arxiv情報

著者	Jiangyuan Liu,Hongxuan Ma,Yuxin Guo,Yuhao Zhao,Chi Zhang,Wei Sui,Wei Zou
発行日	2025-02-20 14:57:01+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Monocular Depth Estimation and Segmentation for Transparent Object with Iterative Semantic and Geometric Fusion

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー