Img2Loc: Revisiting Image Geolocalization using Multi-modality Foundation Models and Image-based Retrieval-Augmented Generation

要約

画像から正確な位置を地理位置特定することは、コンピュータビジョンと情報検索において困難な問題を引き起こします。従来の方法では通常、地球の表面をグリッドセルに分割し、それに応じて画像を分類する分類か、画像を画像データベースと照合することで位置を特定する検索のいずれかを使用します。
-場所のペア。
ただし、分類ベースのアプローチはセルサイズによって制限され、正確な予測を生み出すことができません。一方、検索ベースのシステムは通常、検索品質が低く、さまざまな規模や集計レベルで地球規模の状況を十分にカバーできないという問題があります。
これらの欠点を克服するために、画像の地理的位置特定をテキスト生成タスクとして再定義する新しいシステムである Img2Loc を紹介します。
これは、検索拡張生成を備えた GPT4V や LLaVA などの最先端の大規模マルチモダリティモデルを使用して実現されます。
Img2Loc は、まず CLIP ベースの表現を使用して、画像ベースの座標クエリデータベースを生成します。
次に、クエリ結果と画像自体を独自に組み合わせて、LMM 用にカスタマイズされた精巧なプロンプトを形成します。
Im2GPS3k や YFCC4k などのベンチマークデータセットでテストすると、Img2Loc は以前の最先端モデルのパフォーマンスを上回るだけでなく、モデルのトレーニングなしでもそれを達成します。

要約(オリジナル)

Geolocating precise locations from images presents a challenging problem in computer vision and information retrieval.Traditional methods typically employ either classification, which dividing the Earth surface into grid cells and classifying images accordingly, or retrieval, which identifying locations by matching images with a database of image-location pairs. However, classification-based approaches are limited by the cell size and cannot yield precise predictions, while retrieval-based systems usually suffer from poor search quality and inadequate coverage of the global landscape at varied scale and aggregation levels. To overcome these drawbacks, we present Img2Loc, a novel system that redefines image geolocalization as a text generation task. This is achieved using cutting-edge large multi-modality models like GPT4V or LLaVA with retrieval augmented generation. Img2Loc first employs CLIP-based representations to generate an image-based coordinate query database. It then uniquely combines query results with images itself, forming elaborate prompts customized for LMMs. When tested on benchmark datasets such as Im2GPS3k and YFCC4k, Img2Loc not only surpasses the performance of previous state-of-the-art models but does so without any model training.

arxiv情報

著者	Zhongliang Zhou,Jielu Zhang,Zihan Guan,Mengxuan Hu,Ni Lao,Lan Mu,Sheng Li,Gengchen Mai
発行日	2024-03-28 17:07:02+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Img2Loc: Revisiting Image Geolocalization using Multi-modality Foundation Models and Image-based Retrieval-Augmented Generation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー