Leveraging ChatGPT’s Multimodal Vision Capabilities to Rank Satellite Images by Poverty Level: Advancing Tools for Social Science Research

要約

この論文では、村レベルの貧困予測のために衛星画像を分析するための視覚機能を備えた大規模言語モデル (LLM) の新しいアプリケーションを調査します。
LLM はもともと自然言語を理解するために設計されましたが、地理空間分析を含むマルチモーダルなタスクへの適応性により、データ駆動型研究に新たな境地が開かれました。
ビジョン対応 LLM の進歩を活用することで、衛星画像から人間の貧困について解釈可能でスケーラブルかつ信頼性の高い洞察を提供する LLM の能力を評価します。
ペアごとの比較アプローチを使用して、ChatGPT が分野の専門家に匹敵する精度で貧困レベルに基づいて衛星画像をランク付けできることを実証します。
これらの調査結果は、社会経済研究におけるLLMの可能性と限界の両方を浮き彫りにし、貧困評価のワークフローにLLMを統合するための基盤を提供します。
この研究は、福祉分析のための型破りなデータソースの継続的な探索に貢献し、費用対効果の高い大規模な貧困モニタリングへの道を開きます。

要約(オリジナル)

This paper investigates the novel application of Large Language Models (LLMs) with vision capabilities to analyze satellite imagery for village-level poverty prediction. Although LLMs were originally designed for natural language understanding, their adaptability to multimodal tasks, including geospatial analysis, has opened new frontiers in data-driven research. By leveraging advancements in vision-enabled LLMs, we assess their ability to provide interpretable, scalable, and reliable insights into human poverty from satellite images. Using a pairwise comparison approach, we demonstrate that ChatGPT can rank satellite images based on poverty levels with accuracy comparable to domain experts. These findings highlight both the promise and the limitations of LLMs in socioeconomic research, providing a foundation for their integration into poverty assessment workflows. This study contributes to the ongoing exploration of unconventional data sources for welfare analysis and opens pathways for cost-effective, large-scale poverty monitoring.

arxiv情報

著者	Hamid Sarmadi,Ola Hall,Thorsteinn Rögnvaldsson,Mattias Ohlsson
発行日	2025-01-24 14:49:00+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Leveraging ChatGPT’s Multimodal Vision Capabilities to Rank Satellite Images by Poverty Level: Advancing Tools for Social Science Research

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー