Learning Generalized Zero-Shot Learners for Open-Domain Image Geolocalization

要約

画像のジオローカリゼーションは、特定の写真の地理的座標を予測するという困難なタスクです。
これは、視覚的な手がかりと世界に関する一般的な知識を組み合わせて、地理的に正確な予測を行う能力に依存する未解決の問題です。
$\href{https://huggingface.co/geolocal/StreetCLIP}{\text{StreetCLIP}}$ を提示します。これは、複数のオープンドメインで最先端のパフォーマンスを達成するだけでなく、堅牢で公開されている基盤モデルです。
画像ジオローカリゼーションベンチマークだけでなく、ゼロショット設定でそれを行っており、400 万を超える画像でトレーニングされた教師ありモデルよりも優れています。
私たちの方法は、合成キャプションからCLIPを事前トレーニングし、選択したドメインでCLIPを接地することにより、一般化されたゼロショット学習のためのメタ学習アプローチを導入します。
私たちの方法は、CLIP の一般化されたゼロショット機能を画像ジオローカリゼーションのドメインに効果的に移し、固定された一連のクラスで StreetCLIP を微調整することなく、ドメイン内の一般化されたゼロショットパフォーマンスを改善することを示します。

要約(オリジナル)

Image geolocalization is the challenging task of predicting the geographic coordinates of origin for a given photo. It is an unsolved problem relying on the ability to combine visual clues with general knowledge about the world to make accurate predictions across geographies. We present $\href{https://huggingface.co/geolocal/StreetCLIP}{\text{StreetCLIP}}$, a robust, publicly available foundation model not only achieving state-of-the-art performance on multiple open-domain image geolocalization benchmarks but also doing so in a zero-shot setting, outperforming supervised models trained on more than 4 million images. Our method introduces a meta-learning approach for generalized zero-shot learning by pretraining CLIP from synthetic captions, grounding CLIP in a domain of choice. We show that our method effectively transfers CLIP’s generalized zero-shot capabilities to the domain of image geolocalization, improving in-domain generalized zero-shot performance without finetuning StreetCLIP on a fixed set of classes.

arxiv情報

著者	Lukas Haas,Silas Alberti,Michal Skreta
発行日	2023-02-01 06:44:07+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Learning Generalized Zero-Shot Learners for Open-Domain Image Geolocalization

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー