Lightweight Multimodal Artificial Intelligence Framework for Maritime Multi-Scene Recognition

要約

海上マルチシーン認識は、特に海洋保護、環境監視、災害対応などの用途で、インテリジェントな海洋ロボット工学の能力を高めるために重要です。
ただし、このタスクは、海洋条件が画質を分解する環境干渉と、正確な認識のためのより深い推論が必要な海上シーンの複雑さのために大きな課題を提示します。
純粋なビジョンモデルだけでは、これらの問題に対処するには不十分です。
これらの制限を克服するために、画像データ、テキストの説明、およびマルチモーダル大手言語モデル（MLLM）によって生成された分類ベクターを統合する新しいマルチモーダル人工知能（AI）フレームワークを提案し、より豊かなセマンティック理解を提供し、認識の精度を改善します。
私たちのフレームワークは、複雑な海上環境でのモデルの堅牢性と適応性をさらに高めるために、効率的なマルチモーダル融合メカニズムを採用しています。
実験結果は、私たちのモデルが98 $ \％$の精度を達成し、以前のSOTAモデルを3.5 $ \％$を超えることを示しています。
リソース制約のあるプラットフォームでの展開を最適化するために、アクティベーションアウェア重量量子化（AWQ）を軽量技術として採用し、計算オーバーヘッドを大幅に低下させながら、0.5 $ \％$ $の精度低下でモデルサイズを68.75MBに削減します。
この作業は、リアルタイムの海上シーン認識のための高性能ソリューションを提供し、リソースに制限された設定での環境監視と災害対応をサポートする自律的な地表車（ASV）を可能にします。

要約(オリジナル)

Maritime Multi-Scene Recognition is crucial for enhancing the capabilities of intelligent marine robotics, particularly in applications such as marine conservation, environmental monitoring, and disaster response. However, this task presents significant challenges due to environmental interference, where marine conditions degrade image quality, and the complexity of maritime scenes, which requires deeper reasoning for accurate recognition. Pure vision models alone are insufficient to address these issues. To overcome these limitations, we propose a novel multimodal Artificial Intelligence (AI) framework that integrates image data, textual descriptions and classification vectors generated by a Multimodal Large Language Model (MLLM), to provide richer semantic understanding and improve recognition accuracy. Our framework employs an efficient multimodal fusion mechanism to further enhance model robustness and adaptability in complex maritime environments. Experimental results show that our model achieves 98$\%$ accuracy, surpassing previous SOTA models by 3.5$\%$. To optimize deployment on resource-constrained platforms, we adopt activation-aware weight quantization (AWQ) as a lightweight technique, reducing the model size to 68.75MB with only a 0.5$\%$ accuracy drop while significantly lowering computational overhead. This work provides a high-performance solution for real-time maritime scene recognition, enabling Autonomous Surface Vehicles (ASVs) to support environmental monitoring and disaster response in resource-limited settings.

arxiv情報

著者	Xinyu Xi,Hua Yang,Shentai Zhang,Yijie Liu,Sijin Sun,Xiuju Fu
発行日	2025-03-10 06:47:38+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Lightweight Multimodal Artificial Intelligence Framework for Maritime Multi-Scene Recognition

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー