Detect Anything 3D in the Wild

要約

密集した3Dオブジェクトの検出における深い学習の成功にもかかわらず、既存のアプローチは、新しいオブジェクトとカメラ構成へのゼロショット一般化と闘っています。
単眼の入力のみを使用して、任意のカメラ構成の下で新しいオブジェクトを検出できる、迅速な3D検出ファンデーションモデルであるditany3dを導入します。
3D検出のための基礎モデルのトレーニングは、注釈付き3Dデータの限られた可用性によって根本的に制約されています。これにより、この不足を補うために、訓練された事前に訓練された2Dファンデーションモデルに埋め込まれた豊富な事前知識を活用するように動機付けます。
2Dの知識を3Dに効果的に転送するために、Ditany3Dには2つのコアモジュールが組み込まれています。2Dアグリゲーターは、さまざまな2Dファンデーションモデルから機能を整列させ、3D埋め込みマッピングを備えた3Dインタープリターは、2Dから3Dの知識移転で壊滅的な忘却を軽減します。
実験結果は、目に見えないカテゴリと新しいカメラ構成で最先端のパフォーマンスを達成するだけでなく、ドメイン内データのほとんどの競合他社を上回るDetany3Dを超えるDetany3Dを超えるDetany3Dの強力な一般化を検証します。
オープンワールド設定における3D中心のタスク。
より多くの視覚化の結果は、ditany3dプロジェクトページにあります。

要約(オリジナル)

Despite the success of deep learning in close-set 3D object detection, existing approaches struggle with zero-shot generalization to novel objects and camera configurations. We introduce DetAny3D, a promptable 3D detection foundation model capable of detecting any novel object under arbitrary camera configurations using only monocular inputs. Training a foundation model for 3D detection is fundamentally constrained by the limited availability of annotated 3D data, which motivates DetAny3D to leverage the rich prior knowledge embedded in extensively pre-trained 2D foundation models to compensate for this scarcity. To effectively transfer 2D knowledge to 3D, DetAny3D incorporates two core modules: the 2D Aggregator, which aligns features from different 2D foundation models, and the 3D Interpreter with Zero-Embedding Mapping, which mitigates catastrophic forgetting in 2D-to-3D knowledge transfer. Experimental results validate the strong generalization of our DetAny3D, which not only achieves state-of-the-art performance on unseen categories and novel camera configurations, but also surpasses most competitors on in-domain data.DetAny3D sheds light on the potential of the 3D foundation model for diverse applications in real-world scenarios, e.g., rare object detection in autonomous driving, and demonstrates promise for further exploration of 3D-centric tasks in open-world settings. More visualization results can be found at DetAny3D project page.

arxiv情報

著者	Hanxue Zhang,Haoran Jiang,Qingsong Yao,Yanan Sun,Renrui Zhang,Hao Zhao,Hongyang Li,Hongzi Zhu,Zetong Yang
発行日	2025-04-10 17:59:22+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Detect Anything 3D in the Wild

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー