Enhancing Large Vision Model in Street Scene Semantic Understanding through Leveraging Posterior Optimization Trajectory

要約

自律運転（AD）知覚モデルの一般化を改善するには、継続的に収集されたデータに基づいて、時間の経過とともにモデルを更新する必要があります。
時間が進むにつれて、ADモデルが適合するデータの量が拡大し、ADモデルの一般化を大幅に改善するのに役立ちます。
ただし、このような拡大し続けるデータは、ADモデルの両刃の剣です。
具体的には、適合データ量がADモデルのフィッティング容量を超えるようになると、ADモデルは不十分になりやすくなります。
この問題に対処するために、ADセマンティック情報を理解するために、下流の知覚ヘッドと相まってバックボーンとして、前処理された大型ビジョンモデル（LVM）を使用することを提案します。
この設計は、LVMの強力なフィッティング機能により、前述の不足していない問題を乗り越えるだけでなく、LVMSの広大で多様なトレーニングデータのおかげで認識の一般化を強化することもできます。
一方、LVMバックボーンを実行しながら知覚ヘッドをトレーニングするという車両の計算負担を軽減するために、収束を加速するために事後最適化軌道（POT）ガイド最適化スキーム（POTGUI）を導入します。
具体的には、ポットジェネレーター（POTGEN）を提案して、事前に事後（将来）最適化方向を生成して、モデルが一般に10エポック内に収束できる現在の最適化反復を導きます。
広範な実験は、提案された方法がパフォーマンスを66.48 \％以上改善し、既存の最先端のアプローチと比較して6倍以上に収束することを示しています。

要約(オリジナル)

To improve the generalization of the autonomous driving (AD) perception model, vehicles need to update the model over time based on the continuously collected data. As time progresses, the amount of data fitted by the AD model expands, which helps to improve the AD model generalization substantially. However, such ever-expanding data is a double-edged sword for the AD model. Specifically, as the fitted data volume grows to exceed the the AD model’s fitting capacities, the AD model is prone to under-fitting. To address this issue, we propose to use a pretrained Large Vision Models (LVMs) as backbone coupled with downstream perception head to understand AD semantic information. This design can not only surmount the aforementioned under-fitting problem due to LVMs’ powerful fitting capabilities, but also enhance the perception generalization thanks to LVMs’ vast and diverse training data. On the other hand, to mitigate vehicles’ computational burden of training the perception head while running LVM backbone, we introduce a Posterior Optimization Trajectory (POT)-Guided optimization scheme (POTGui) to accelerate the convergence. Concretely, we propose a POT Generator (POTGen) to generate posterior (future) optimization direction in advance to guide the current optimization iteration, through which the model can generally converge within 10 epochs. Extensive experiments demonstrate that the proposed method improves the performance by over 66.48\% and converges faster over 6 times, compared to the existing state-of-the-art approach.

arxiv情報

著者	Wei-Bin Kou,Qingfeng Lin,Ming Tang,Jingreng Lei,Shuai Wang,Rongguang Ye,Guangxu Zhu,Yik-Chung Wu
発行日	2025-05-30 23:06:21+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Enhancing Large Vision Model in Street Scene Semantic Understanding through Leveraging Posterior Optimization Trajectory

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー