Learning to Detect and Segment Mobile Objects from Unlabeled Videos

要約

身体化されたエージェントは、関心のあるオブジェクトを検出して位置を特定する必要があります。
自動運転車の交通参加者。
このタスクの境界ボックスの形式での監視には非常にコストがかかります。
そのため、これまでの研究では教師なしオブジェクトのセグメンテーションを検討していましたが、注釈付きのボックスが存在しないため、ピクセルをどのようにオブジェクトにグループ化する必要があるのか、またどのオブジェクトが対象となるのかが不明瞭でした。
その結果、過剰または過少セグメント化や無関係なオブジェクトが発生します。
人間の視覚システムと実際のアプリケーションの両方からインスピレーションを得て、私たちは重要な欠如している手がかりは動きであると仮定します。つまり、対象となるオブジェクトは通常、移動するオブジェクトです。
私たちは、ラベルのないビデオのみから学習した移動物体検出器である MOD-UV を提案します。
私たちはモーションセグメンテーションから得られた疑似ラベルから始めますが、モーションセグメンテーションによって見逃される小さなオブジェクトや静止しているが移動可能なオブジェクトを徐々に発見するための新しいトレーニングパラダイムを導入します。
その結果、ラベルのないビデオからのみ学習しますが、MOD-UV は単一の静止画像から移動オブジェクトを検出してセグメント化できます。
経験的に、外部データや教師ありモデルを使用せずに、Waymo Open、nuScenes、KITTI Dataset 上で教師なし移動体検出において最先端のパフォーマンスを達成しました。
コードは https://github.com/YihongSun/MOD-UV で公開されています。

要約(オリジナル)

Embodied agents must detect and localize objects of interest, e.g. traffic participants for self-driving cars. Supervision in the form of bounding boxes for this task is extremely expensive. As such, prior work has looked at unsupervised object segmentation, but in the absence of annotated boxes, it is unclear how pixels must be grouped into objects and which objects are of interest. This results in over- / under-segmentation and irrelevant objects. Inspired both by the human visual system and by practical applications, we posit that the key missing cue is motion: objects of interest are typically mobile objects. We propose MOD-UV, a Mobile Object Detector learned from Unlabeled Videos only. We begin with pseudo-labels derived from motion segmentation, but introduce a novel training paradigm to progressively discover small objects and static-but-mobile objects that are missed by motion segmentation. As a result, though only learned from unlabeled videos, MOD-UV can detect and segment mobile objects from a single static image. Empirically, we achieve state-of-the-art performance in unsupervised mobile object detection on Waymo Open, nuScenes, and KITTI Dataset without using any external data or supervised models. Code is publicly available at https://github.com/YihongSun/MOD-UV.

arxiv情報

著者	Yihong Sun,Bharath Hariharan
発行日	2024-05-23 17:55:11+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Learning to Detect and Segment Mobile Objects from Unlabeled Videos

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー