Diversify Your Vision Datasets with Automatic Diffusion-Based Augmentation

要約

希少動物の識別など、多くのきめの細かい分類タスクではトレーニングデータが限られているため、これらのデータセットでトレーニングされた分類子は、天候や場所の変化などの領域の変化に一般化できないことがよくあります。
そのため、私たちは、トレーニングデータに含まれるドメインの自然言語記述を、さまざまな事前トレーニングデータセットでトレーニングされた大規模ビジョンモデルでどのように使用して、トレーニングデータの有用なバリエーションを生成できるかを検討します。
ALIA (Automated Language-guided Image Augmentation) を紹介します。これは、大規模なビジョンと言語モデルを利用して、データセットのドメインの自然言語記述を自動的に生成し、言語ガイド付き画像編集によってトレーニングデータを拡張する方法です。
データの整合性を維持するために、元のデータセットでトレーニングされたモデルは、最小限の画像編集とクラス関連の情報を破損する画像編集を除外します。
結果として得られるデータセットは、元のトレーニングデータと視覚的に一致しており、多様性が大幅に強化されています。
分類と検出のための粒度が細かく乱雑なデータセットでは、ALIA は従来のデータ拡張やテキストから画像に生成されたデータを最大 15\% 上回り、多くの場合、実際のデータの同等の追加を上回るパフォーマンスを発揮します。
コードは https://github.com/lisadunlap/ALIA で入手できます。

要約(オリジナル)

Many fine-grained classification tasks, like rare animal identification, have limited training data and consequently classifiers trained on these datasets often fail to generalize to variations in the domain like changes in weather or location. As such, we explore how natural language descriptions of the domains seen in training data can be used with large vision models trained on diverse pretraining datasets to generate useful variations of the training data. We introduce ALIA (Automated Language-guided Image Augmentation), a method which utilizes large vision and language models to automatically generate natural language descriptions of a dataset’s domains and augment the training data via language-guided image editing. To maintain data integrity, a model trained on the original dataset filters out minimal image edits and those which corrupt class-relevant information. The resulting dataset is visually consistent with the original training data and offers significantly enhanced diversity. On fine-grained and cluttered datasets for classification and detection, ALIA surpasses traditional data augmentation and text-to-image generated data by up to 15\%, often even outperforming equivalent additions of real data. Code is avilable at https://github.com/lisadunlap/ALIA.

arxiv情報

著者	Lisa Dunlap,Alyssa Umino,Han Zhang,Jiezhi Yang,Joseph E. Gonzalez,Trevor Darrell
発行日	2023-05-25 17:43:05+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Diversify Your Vision Datasets with Automatic Diffusion-Based Augmentation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー