A Study on Unsupervised Domain Adaptation for Semantic Segmentation in the Era of Vision-Language Models

要約

深層学習ベースのコンピュータービジョンは最近進歩しているにもかかわらず、ドメインの移行は依然として大きな課題の 1 つです。
自動運転のためのセマンティックセグメンテーションは、幅広いドメインの変化に直面しています。
これは、気象条件の変化、新しい地理位置情報、モデルトレーニングでの合成データの頻繁な使用によって引き起こされます。
新しいターゲットドメインのラベルなしデータのみを使用してモデルをそのドメインに適応させる教師なしドメイン適応 (UDA) 手法が登場しました。
UDA メソッドの種類は多岐にわたりますが、それらはすべて ImageNet の事前トレーニング済みモデルを使用します。
最近、視覚言語モデルは、ドメイン適応を促進する強力な一般化機能を実証しました。
DACS などの既存の UDA メソッドのエンコーダーをビジョン言語の事前トレーニング済みエンコーダーに置き換えるだけで、GTA5 から Cityscapes ドメインへの移行において最大 10.0% mIoU の大幅なパフォーマンス向上が得られることを示します。
目に見えない領域に対する汎化パフォーマンスについては、新しく採用されたビジョン言語の事前トレーニング済みエンコーダーにより、3 つの目に見えないデータセット全体で最大 13.7% mIoU のゲインが得られます。
ただし、すべての UDA メソッドが新しいエンコーダーと簡単に組み合わせられるわけではなく、UDA のパフォーマンスが常に同様に汎化パフォーマンスに反映されるわけではないことがわかりました。
最後に、悪天候条件のドメインシフトに関する実験を実行して、純粋な現実から現実へのドメインシフトに関する発見をさらに検証します。

要約(オリジナル)

Despite the recent progress in deep learning based computer vision, domain shifts are still one of the major challenges. Semantic segmentation for autonomous driving faces a wide range of domain shifts, e.g. caused by changing weather conditions, new geolocations and the frequent use of synthetic data in model training. Unsupervised domain adaptation (UDA) methods have emerged which adapt a model to a new target domain by only using unlabeled data of that domain. The variety of UDA methods is large but all of them use ImageNet pre-trained models. Recently, vision-language models have demonstrated strong generalization capabilities which may facilitate domain adaptation. We show that simply replacing the encoder of existing UDA methods like DACS by a vision-language pre-trained encoder can result in significant performance improvements of up to 10.0% mIoU on the GTA5-to-Cityscapes domain shift. For the generalization performance to unseen domains, the newly employed vision-language pre-trained encoder provides a gain of up to 13.7% mIoU across three unseen datasets. However, we find that not all UDA methods can be easily paired with the new encoder and that the UDA performance does not always likewise transfer into generalization performance. Finally, we perform our experiments on an adverse weather condition domain shift to further verify our findings on a pure real-to-real domain shift.

arxiv情報

著者	Manuel Schwonberg,Claus Werner,Hanno Gottschalk,Carsten Meyer
発行日	2024-11-25 14:12:24+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

A Study on Unsupervised Domain Adaptation for Semantic Segmentation in the Era of Vision-Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー