Using Language to Extend to Unseen Domains

要約

展開時にビジョンモデルが遭遇する可能性のあるすべてのドメインのトレーニングデータを収集するには、コストがかかります。
代わりに、トレーニングドメイン (「鳥の写真」など) と、拡張したいがデータがないドメイン (「鳥の絵」など) を簡単に言語化することで、堅牢性を向上できることを検討します。
共同画像と言語埋め込み空間を持つマルチモーダルモデルを使用して、LADS メソッドは、タスク関連情報を保持しながら、トレーニングドメインから各目に見えないテストドメインへの画像埋め込みの変換を学習します。
目に見えないテストドメインからの画像を使用せずに、トレーニングドメインと目に見えないテストドメインの両方を含む拡張ドメインで、ドメイン適応とデータセットバイアスを対象とする 4 つのベンチマークのスイートで、LADS が標準の微調整とアンサンブルアプローチよりも優れていることを示します。

要約(オリジナル)

It is expensive to collect training data for every possible domain that a vision model may encounter when deployed. We instead consider how simply verbalizing the training domain (e.g. ‘photos of birds’) as well as domains we want to extend to but do not have data for (e.g. ‘paintings of birds’) can improve robustness. Using a multimodal model with a joint image and language embedding space, our method LADS learns a transformation of the image embeddings from the training domain to each unseen test domain, while preserving task relevant information. Without using any images from the unseen test domain, we show that over the extended domain containing both training and unseen test domains, LADS outperforms standard fine-tuning and ensemble approaches over a suite of four benchmarks targeting domain adaptation and dataset bias

arxiv情報

著者	Lisa Dunlap,Clara Mohri,Devin Guillory,Han Zhang,Trevor Darrell,Joseph E. Gonzalez,Aditi Raghunathan,Anja Rohrbach
発行日	2022-10-20 17:32:48+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Using Language to Extend to Unseen Domains

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー