CounTR: Transformer-based Generalised Visual Counting

要約

この論文では、任意の数の「見本」、つまりゼロショットまたは少数ショットのカウントを使用して、任意のセマンティックカテゴリからオブジェクトの数をカウントするための計算モデルを開発することを目標に、一般化された視覚オブジェクトカウントの問題を検討します。
.
この目的のために、次の 4 つの貢献を行います。(1) 一般化されたビジュアルオブジェクトのカウントのための新しいトランスフォーマーベースのアーキテクチャを導入します。これは、カウンティングトランスフォーマー (CounTR) と呼ばれ、画像パッチ間または特定の「手本」との類似性を明示的にキャプチャします。
注意メカニズムを使用します;(2) 最初に自己教師あり学習でモデルを事前トレーニングし、次に教師あり微調整を行う 2 段階のトレーニング体制を採用します;(3) 合成のためのシンプルでスケーラブルなパイプラインを提案します
多数のインスタンスまたは異なるセマンティックカテゴリからのイメージをトレーニングし、明示的にモデルに特定の「手本」を使用するように強制します。
FSC-147 に準拠し、ゼロショット設定と少数ショット設定の両方で最先端のパフォーマンスを発揮します。

要約(オリジナル)

In this paper, we consider the problem of generalised visual object counting, with the goal of developing a computational model for counting the number of objects from arbitrary semantic categories, using arbitrary number of ‘exemplars’, i.e. zero-shot or few-shot counting. To this end, we make the following four contributions: (1) We introduce a novel transformer-based architecture for generalised visual object counting, termed as Counting Transformer (CounTR), which explicitly capture the similarity between image patches or with given ‘exemplars’ with the attention mechanism;(2) We adopt a two-stage training regime, that first pre-trains the model with self-supervised learning, and followed by supervised fine-tuning;(3) We propose a simple, scalable pipeline for synthesizing training images with a large number of instances or that from different semantic categories, explicitly forcing the model to make use of the given ‘exemplars’;(4) We conduct thorough ablation studies on the large-scale counting benchmark, e.g. FSC-147, and demonstrate state-of-the-art performance on both zero and few-shot settings.

arxiv情報

著者	Chang Liu,Yujie Zhong,Andrew Zisserman,Weidi Xie
発行日	2022-08-29 17:02:45+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

CounTR: Transformer-based Generalised Visual Counting

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー