End-to-end Semantic Object Detection with Cross-Modal Alignment

要約

従来のセマンティックイメージ検索方法は、テキストクエリの意味に一致するイメージを取得することを目的としています。
ただし、これらの方法は通常、画像内のオブジェクトのローカリゼーションを考慮せずに、画像全体でオブジェクトを検索します。
このペーパーでは、画像内のオブジェクトの検索に焦点を当てて、オブジェクト提案とテキストクエリの間のセマンティックアラインメントを考慮する、セマンティックイメージ検索用の既存のオブジェクト検出モデルの拡張について説明します。
提案されたモデルは、単一の特徴抽出器、事前トレーニング済みの畳み込みニューラルネットワーク、およびトランスフォーマーエンコーダーを使用して、テキストクエリをエンコードします。
提案とテキストのアライメントは、対照的な学習を使用して実行され、テキストクエリとのセマンティックなアライメントを反映する各提案のスコアを生成します。
Region Proposal Network (RPN) を使用してオブジェクトの提案を生成し、エンドツーエンドのトレーニングプロセスにより、セマンティックイメージ検索の効率的かつ効果的なソリューションを実現します。
提案されたモデルはエンドツーエンドでトレーニングされ、テキストクエリの意味に一致する画像を取得し、意味的に関連するオブジェクトの提案を生成するセマンティックイメージ検索の有望なソリューションを提供します。

要約(オリジナル)

Traditional semantic image search methods aim to retrieve images that match the meaning of the text query. However, these methods typically search for objects on the whole image, without considering the localization of objects within the image. This paper presents an extension of existing object detection models for semantic image search that considers the semantic alignment between object proposals and text queries, with a focus on searching for objects within images. The proposed model uses a single feature extractor, a pre-trained Convolutional Neural Network, and a transformer encoder to encode the text query. Proposal-text alignment is performed using contrastive learning, producing a score for each proposal that reflects its semantic alignment with the text query. The Region Proposal Network (RPN) is used to generate object proposals, and the end-to-end training process allows for an efficient and effective solution for semantic image search. The proposed model was trained end-to-end, providing a promising solution for semantic image search that retrieves images that match the meaning of the text query and generates semantically relevant object proposals.

arxiv情報

著者	Silvan Ferreira,Allan Martins,Ivanovitch Silva
発行日	2023-02-10 12:06:18+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

End-to-end Semantic Object Detection with Cross-Modal Alignment

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー