Remote Sensing Temporal Vision-Language Models: A Comprehensive Survey

要約

リモートセンシングにおける時間画像解析は、従来、異なる時間に撮影された画像間の変化領域を特定する変化検出が中心であった。しかし、変化検出は、視覚レベルの解釈に重点を置いているため、文脈や説明的な情報を欠くことが多く、依然として限界がある。視覚言語モデル（VLM）の台頭は、視覚情報を自然言語と統合することで、リモートセンシングの時間画像解析に新たな次元を導入し、時間画像の変化を高度に解釈する道を開いた。リモートセンシング時間VLM（RSTVLM）は、動的なインタラクションを可能にし、説明的なキャプションを生成し、質問に答え、時間画像のより豊かな意味理解を提供する。この時間ビジョン言語能力は、より高度な洞察が重要な複雑なリモートセンシングアプリケーションにとって特に価値がある。本稿では、時間画像解析のための最新のVLMアプリケーションを中心に、RSTVLM研究の進展を包括的にレビューする。主要な方法論、データセット、メトリクスを分類し、議論し、時間視覚言語タスクにおける最近の進歩を強調し、この新しい分野における研究の主要な課題と将来の方向性を概説する。このサーベイは、RSTVLMの統合的な概観を提供することにより、文献における重要なギャップを埋めるものであり、リモートセンシングの時間画像理解における更なる進歩のための基礎を提供するものである。関連する研究については、୧⃛(๑⃙⃘⁼̴̀꒳⁼̴́๑⃙⃘)

要約(オリジナル)

Temporal image analysis in remote sensing has traditionally centered on change detection, which identifies regions of change between images captured at different times. However, change detection remains limited by its focus on visual-level interpretation, often lacking contextual or descriptive information. The rise of Vision-Language Models (VLMs) has introduced a new dimension to remote sensing temporal image analysis by integrating visual information with natural language, creating an avenue for advanced interpretation of temporal image changes. Remote Sensing Temporal VLMs (RSTVLMs) allow for dynamic interactions, generating descriptive captions, answering questions, and providing a richer semantic understanding of temporal images. This temporal vision-language capability is particularly valuable for complex remote sensing applications, where higher-level insights are crucial. This paper comprehensively reviews the progress of RSTVLM research, with a focus on the latest VLM applications for temporal image analysis. We categorize and discuss core methodologies, datasets, and metrics, highlight recent advances in temporal vision-language tasks, and outline key challenges and future directions for research in this emerging field. This survey fills a critical gap in the literature by providing an integrated overview of RSTVLM, offering a foundation for further advancements in remote sensing temporal image understanding. We will keep tracing related works at \url{https://github.com/Chen-Yang-Liu/Awesome-RS-Temporal-VLM}

arxiv情報

著者	Chenyang Liu,Jiafan Zhang,Keyan Chen,Man Wang,Zhengxia Zou,Zhenwei Shi
発行日	2024-12-03 16:56:10+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Remote Sensing Temporal Vision-Language Models: A Comprehensive Survey

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー