Granular Privacy Control for Geolocation with Vision Language Models

要約

ビジョン言語モデル (VLM) は、情報を求める質問に答える機能が急速に進歩しています。
これらのモデルは消費者向けアプリケーションに広く導入されているため、写真内の人物を特定したり、画像の位置情報を特定したりする機能が新たに登場するため、新たなプライバシーリスクが生じる可能性があります。私たちが実証したように、少々驚くべきことに、現在のオープンソースおよび独自の VLM は非常に有能な画像です。
ジオロケーターは、VLM による広範な地理位置情報を、単なる理論上の将来の懸念ではなく、差し迫ったプライバシーリスクにしています。
この課題に対処する最初のステップとして、VLM がユーザーとの地理位置情報対話を管理する能力をテストするための新しいベンチマーク GPTGeoChat を開発しました。
私たちは、社内のアノテーターと GPT-4v の間で行われた 1,000 件の画像地理位置情報の会話を収集します。これには、各ターンで明らかにされる位置情報の粒度で注釈が付けられます。
この新しいデータセットを使用して、多すぎる位置情報が明らかになった時期を判断することで、GPT-4v 地理位置情報の会話を調整するさまざまな VLM の能力を評価します。
国または都市レベルで漏洩した位置情報を特定する場合、カスタムの微調整されたモデルは、プロンプトによる API ベースのモデルと同等のパフォーマンスを発揮することがわかりました。
ただし、レストランや建物の名前など、より細かい粒度を正確に調整するには、教師付きデータを微調整する必要があるようです。

要約(オリジナル)

Vision Language Models (VLMs) are rapidly advancing in their capability to answer information-seeking questions. As these models are widely deployed in consumer applications, they could lead to new privacy risks due to emergent abilities to identify people in photos, geolocate images, etc. As we demonstrate, somewhat surprisingly, current open-source and proprietary VLMs are very capable image geolocators, making widespread geolocation with VLMs an immediate privacy risk, rather than merely a theoretical future concern. As a first step to address this challenge, we develop a new benchmark, GPTGeoChat, to test the ability of VLMs to moderate geolocation dialogues with users. We collect a set of 1,000 image geolocation conversations between in-house annotators and GPT-4v, which are annotated with the granularity of location information revealed at each turn. Using this new dataset, we evaluate the ability of various VLMs to moderate GPT-4v geolocation conversations by determining when too much location information has been revealed. We find that custom fine-tuned models perform on par with prompted API-based models when identifying leaked location information at the country or city level; however, fine-tuning on supervised data appears to be needed to accurately moderate finer granularities, such as the name of a restaurant or building.

arxiv情報

著者	Ethan Mendes,Yang Chen,James Hays,Sauvik Das,Wei Xu,Alan Ritter
発行日	2024-10-17 14:58:53+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Granular Privacy Control for Geolocation with Vision Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー