BEVControl: Accurately Controlling Street-view Elements with Multi-perspective Consistency via BEV Sketch Layout

要約

知覚モデルの性能を向上させるために合成画像を使用することは、コンピュータビジョンにおける長年の研究課題である。マルチビューカメラを用いた視覚中心の自律走行システムでは、ロングテールのシナリオが収集できないことがあるため、より顕著になる。BEVのセグメンテーションレイアウトに導かれた既存の生成ネットワークは、シーンレベルのメトリクスのみで評価すると、フォトリアリスティックなストリートビュー画像を合成するように見える。しかし、一旦ズームインすると、通常、方位などの正確な前景と背景の詳細を生成できない。このため、我々は、正確な前景と背景の内容を生成できる、BEVControlと名付けられた2段階の生成的手法を提案する。セグメンテーションのような入力とは対照的に、人間がより柔軟に編集できるスケッチスタイルの入力もサポートする。さらに、生成されたシーン、前景オブジェクト、背景ジオメトリの品質を公平に比較するための包括的なマルチレベル評価プロトコルを提案する。我々の広範な実験によれば、我々のBEVControlは、前景セグメンテーションmIoUにおいて、5.89から26.80という大きな差で、最先端の手法であるBEVGenを凌駕している。さらに、BEVControlによって生成された画像を下流の知覚モデルの訓練に使用することで、NDSスコアが平均1.29向上することを示す。

要約(オリジナル)

Using synthesized images to boost the performance of perception models is a long-standing research challenge in computer vision. It becomes more eminent in visual-centric autonomous driving systems with multi-view cameras as some long-tail scenarios can never be collected. Guided by the BEV segmentation layouts, the existing generative networks seem to synthesize photo-realistic street-view images when evaluated solely on scene-level metrics. However, once zoom-in, they usually fail to produce accurate foreground and background details such as heading. To this end, we propose a two-stage generative method, dubbed BEVControl, that can generate accurate foreground and background contents. In contrast to segmentation-like input, it also supports sketch style input, which is more flexible for humans to edit. In addition, we propose a comprehensive multi-level evaluation protocol to fairly compare the quality of the generated scene, foreground object, and background geometry. Our extensive experiments show that our BEVControl surpasses the state-of-the-art method, BEVGen, by a significant margin, from 5.89 to 26.80 on foreground segmentation mIoU. In addition, we show that using images generated by BEVControl to train the downstream perception model, it achieves on average 1.29 improvement in NDS score.

arxiv情報

著者	Kairui Yang,Enhui Ma,Jibin Peng,Qing Guo,Di Lin,Kaicheng Yu
発行日	2023-08-04 03:00:58+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

BEVControl: Accurately Controlling Street-view Elements with Multi-perspective Consistency via BEV Sketch Layout

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー