IFShip: Interpretable Fine-grained Ship Classification with Domain Knowledge-Enhanced Vision-Language Models

要約

エンドツーエンドの解釈は、現在、リモートセンシングの細粒船分類（RS-FGSC）タスクを支配しています。
ただし、推論プロセスは解釈できないままであり、これらのモデルが「ブラックボックス」システムとして批判することにつながります。
この問題に対処するために、ドメインの知識強化されたチェーンオブサベート（COT）プロンプト生成メカニズムを提案します。これは、タスク固有の命令に従うデータセットであるタイタニック系FGを半自動的に構築するために使用されます。
Titanic-FGSをトレーニングすることにより、一般的なドメインビジョン言語モデル（VLM）をFGSCタスクに適応させ、Ifshipという名前のモデルになります。
Ifshipに基づいて、FGSCの問題を段階的な推論タスクとして再定義するFGSCビジュアルチャットボットを開発し、自然言語で推論プロセスを伝えます。
実験結果は、Ifshipが解釈可能性と分類精度の両方で最先端のFGSCアルゴリズムを上回ることを示しています。
さらに、LlavaやMinigpt-4などのVLMと比較して、IfshipはFGSCタスクで優れたパフォーマンスを示しています。
きめ細かい船の種類が人間の目に認識できる場合、正確な一連の推論を提供し、そうでない場合は解釈可能な説明を提供します。

要約(オリジナル)

End-to-end interpretation currently dominates the remote sensing fine-grained ship classification (RS-FGSC) task. However, the inference process remains uninterpretable, leading to criticisms of these models as ‘black box’ systems. To address this issue, we propose a domain knowledge-enhanced Chain-of-Thought (CoT) prompt generation mechanism, which is used to semi-automatically construct a task-specific instruction-following dataset, TITANIC-FGS. By training on TITANIC-FGS, we adapt general-domain vision-language models (VLMs) to the FGSC task, resulting in a model named IFShip. Building upon IFShip, we develop an FGSC visual chatbot that redefines the FGSC problem as a step-by-step reasoning task and conveys the reasoning process in natural language. Experimental results show that IFShip outperforms state-of-the-art FGSC algorithms in both interpretability and classification accuracy. Furthermore, compared to VLMs such as LLaVA and MiniGPT-4, IFShip demonstrates superior performance on the FGSC task. It provides an accurate chain of reasoning when fine-grained ship types are recognizable to the human eye and offers interpretable explanations when they are not.

arxiv情報

著者	Mingning Guo,Mengwei Wu,Yuxiang Shen,Haifeng Li,Chao Tao
発行日	2025-03-11 12:02:01+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

IFShip: Interpretable Fine-grained Ship Classification with Domain Knowledge-Enhanced Vision-Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー