Learning Robust Visual-Semantic Embedding for Generalizable Person Re-identification

要約

タイトル:
汎用人物再識別のための頑強な視覚的意味的埋め込み学習

要約:
人物再識別は公共の安全やビデオ監視など様々なアプリケーションにおいて大きな役割を持つため、汎用人物再識別(Re-ID)は機械学習やコンピュータビジョンにおいて非常に注目されている研究トピックの1つである。しかし、従来の手法は視覚的表現学習に主眼を置いており、トレーニング中に意味的特徴の潜在能力を探ることが怠り、新しいドメインに適応する場合には一般化能力が低くなることがよくある。本論文では、MMET (Multi-Modal Equivalent Transformer)を提案し、視覚的、テキスト、視覚的テキストの3つのタスクにおいて、より頑強な視覚的意味的埋め込み学習を行う。Transformerの文脈における堅牢な特徴学習をさらに向上させるために、マスキングメカニズムであるMasked Multimodal Modeling(MMM)戦略を導入して、画像パッチとテキストトークンの両方をマスクすることで、複数のモードまたはユニモードデータ上に共同作用し、汎用人物再識別の性能を大幅に向上させる。ベンチマークデータセットでの多数の実験で、本手法が従来の手法よりも競争力がある性能を示している。本手法が視覚的意味的表現学習に向けて研究を前進させることを願っている。ソースコードは以下で公開されている。 https://github.com/JeremyXSC/MMET.

要約(オリジナル)

Generalizable person re-identification (Re-ID) is a very hot research topic in machine learning and computer vision, which plays a significant role in realistic scenarios due to its various applications in public security and video surveillance. However, previous methods mainly focus on the visual representation learning, while neglect to explore the potential of semantic features during training, which easily leads to poor generalization capability when adapted to the new domain. In this paper, we propose a Multi-Modal Equivalent Transformer called MMET for more robust visual-semantic embedding learning on visual, textual and visual-textual tasks respectively. To further enhance the robust feature learning in the context of transformer, a dynamic masking mechanism called Masked Multimodal Modeling strategy (MMM) is introduced to mask both the image patches and the text tokens, which can jointly works on multimodal or unimodal data and significantly boost the performance of generalizable person Re-ID. Extensive experiments on benchmark datasets demonstrate the competitive performance of our method over previous approaches. We hope this method could advance the research towards visual-semantic representation learning. Our source code is also publicly available at https://github.com/JeremyXSC/MMET.

arxiv情報

著者	Suncheng Xiang,Jingsheng Gao,Mengyuan Guan,Jiacheng Ruan,Chengfeng Zhou,Ting Liu,Dahong Qian,Yuzhuo Fu
発行日	2023-04-19 08:37:25+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

Learning Robust Visual-Semantic Embedding for Generalizable Person Re-identification

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー