IIITD-20K: Dense captioning for Text-Image ReID

要約

【タイトル】IIITD-20K：Text-Image ReIDのDense captioning

【要約】
-Text-to-Image (T2I) ReIDは、最近注目されており、CUHK-PEDES、RSTPReid、ICFG-PEDESの3つのベンチマークを評価することができます。
-RSTPReidとICFG-PEDESは、MSMT17からのIDを含みますが、固有の人数が制限されているため、多様性が限られています。
-一方、CUHK-PEDESは13,003のIDを含みますが、平均して比較的短いテキストの記述があります。
-これらのデータセットは、制限された環境で、限られた数のカメラでキャプチャされます。
-個性を多様化し、密なキャプションを提供するために、我々はIIITD-20Kという新しいデータセットを提案しています。
-IIITD-20Kは、野外でキャプチャされた20,000の固有のIDを含み、テキスト-イメージReIDのための豊富なデータセットを提供します。
-最小でも26単語の説明を持つ各画像に密にキャプションが付けられています。
-我々はStable-diffusionとBLIPモデルを使用して、人工的に画像と精密なキャプションを生成し、我々のデータセットでトレーニングしました。
-最新のテキスト-イメージReIDモデルやビジョン-言語プレトレーニングモデルを使用して詳細な実験を行い、データセットの包括的な分析を提示します。
-我々の実験は、人工的に生成されたデータが、同じデータセットだけでなく、異なるデータセット設定でも大幅な性能向上をもたらすことを示しています。
-データセットはhttps://bit.ly/3pkA3Rjで利用可能です。

要約(オリジナル)

Text-to-Image (T2I) ReID has attracted a lot of attention in the recent past. CUHK-PEDES, RSTPReid and ICFG-PEDES are the three available benchmarks to evaluate T2I ReID methods. RSTPReid and ICFG-PEDES comprise of identities from MSMT17 but due to limited number of unique persons, the diversity is limited. On the other hand, CUHK-PEDES comprises of 13,003 identities but has relatively shorter text description on average. Further, these datasets are captured in a restricted environment with limited number of cameras. In order to further diversify the identities and provide dense captions, we propose a novel dataset called IIITD-20K. IIITD-20K comprises of 20,000 unique identities captured in the wild and provides a rich dataset for text-to-image ReID. With a minimum of 26 words for a description, each image is densely captioned. We further synthetically generate images and fine-grained captions using Stable-diffusion and BLIP models trained on our dataset. We perform elaborate experiments using state-of-art text-to-image ReID models and vision-language pre-trained models and present a comprehensive analysis of the dataset. Our experiments also reveal that synthetically generated data leads to a substantial performance improvement in both same dataset as well as cross dataset settings. Our dataset is available at https://bit.ly/3pkA3Rj.

arxiv情報

著者	A V Subramanyam,Niranjan Sundararajan,Vibhu Dubey,Brejesh Lall
発行日	2023-05-08 06:46:56+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

IIITD-20K: Dense captioning for Text-Image ReID

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー