Enhancing Large Vision Model in Street Scene Semantic Understanding through Leveraging Posterior Optimization Trajectory

要約

自律走行（AD）知覚モデルの汎化性を向上させるために、車両は継続的に収集されるデータに基づいて時間経過とともにモデルを更新する必要がある。時間が進むにつれて、ADモデルが適合するデータ量は拡大し、ADモデルの汎化を大幅に改善するのに役立つ。しかし、このように拡大し続けるデータは、ADモデルにとっては諸刃の剣である。具体的には、フィッティングされるデータ量が増大し、ADモデルのフィッティング能力を超えると、ADモデルはアンダーフィッティングに陥りやすくなる。この問題に対処するために、我々は、ADの意味情報を理解するために、事前に訓練されたラージビジョンモデル（LVMs）を、下流の知覚ヘッドと組み合わせてバックボーンとして使用することを提案する。この設計は、LVMsの強力なフィッティング能力により、前述のアンダーフィッティング問題を克服できるだけでなく、LVMsの膨大で多様な訓練データにより、知覚の汎化を強化することができる。一方、LVMバックボーンを実行しながら知覚ヘッドを訓練する車両の計算負担を軽減するために、我々は収束を加速するための事後最適化軌道（Posterior Optimization Trajectory: POT）ガイド付き最適化スキーム（POTGui）を導入する。具体的には、POTジェネレータ(POTGen)を提案し、現在の最適化の反復をガイドするために、事前に事後的な最適化の方向性を生成する。広範な実験により、提案手法は既存の最先端アプローチと比較して、66.48%以上性能が向上し、6倍以上速く収束することが実証された。

要約(オリジナル)

To improve the generalization of the autonomous driving (AD) perception model, vehicles need to update the model over time based on the continuously collected data. As time progresses, the amount of data fitted by the AD model expands, which helps to improve the AD model generalization substantially. However, such ever-expanding data is a double-edged sword for the AD model. Specifically, as the fitted data volume grows to exceed the the AD model’s fitting capacities, the AD model is prone to under-fitting. To address this issue, we propose to use a pretrained Large Vision Models (LVMs) as backbone coupled with downstream perception head to understand AD semantic information. This design can not only surmount the aforementioned under-fitting problem due to LVMs’ powerful fitting capabilities, but also enhance the perception generalization thanks to LVMs’ vast and diverse training data. On the other hand, to mitigate vehicles’ computational burden of training the perception head while running LVM backbone, we introduce a Posterior Optimization Trajectory (POT)-Guided optimization scheme (POTGui) to accelerate the convergence. Concretely, we propose a POT Generator (POTGen) to generate posterior (future) optimization direction in advance to guide the current optimization iteration, through which the model can generally converge within 10 epochs. Extensive experiments demonstrate that the proposed method improves the performance by over 66.48\% and converges faster over 6 times, compared to the existing state-of-the-art approach.

arxiv情報

著者	Wei-Bin Kou,Qingfeng Lin,Ming Tang,Shuai Wang,Rongguang Ye,Guangxu Zhu,Yik-Chung Wu
発行日	2025-01-03 09:10:56+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Enhancing Large Vision Model in Street Scene Semantic Understanding through Leveraging Posterior Optimization Trajectory

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー