AgroGPT: Efficient Agricultural Vision-Language Model with Expert Tuning

要約

オンラインで入手可能な画像テキストデータの膨大なリポジトリを活用して、大規模なマルチモーダル会話モデル (LMM) の進歩において大きな進歩が見られました。
このような進歩にもかかわらず、これらのモデルは多くの場合、大きなドメインギャップに遭遇し、新しいドメイン間で複雑な会話を行う能力を妨げています。
最近の取り組みは、命令チューニングデータを厳選するためにドメイン固有の画像テキストデータに依存しているにもかかわらず、この問題を軽減することを目的としています。
しかし、農業などの多くの分野には、そのような視覚言語データが不足しています。
この研究では、農業分野の視覚のみのデータを利用する指示調整データを構築するアプローチを提案します。
私たちは、複数のドメインにまたがる多様な農業データセットを利用し、クラス固有の情報を厳選し、大規模言語モデル (LLM) を採用してエキスパートチューニングセットを構築し、その結果、AgroInstruct と呼ばれる 70,000 のエキスパートチューニングデータセットが完成しました。
その後、複雑な農業関連の会話を保持し、有用な洞察を提供できる効率的な LMM である AgroGPT を専門家が調整して作成しました。
また、評価用に AgroEvals を開発し、{AgroGPT} のパフォーマンスを大規模なオープンおよびクローズドソースモデルと比較します。
{AgroGPT} は、きめ細かい農業概念の特定に優れ、農業の専門家として機能し、複合的な農業の質問に役立つ情報を提供します。
コード、データセット、モデルは https://github.com/awaisrauf/agroGPT で入手できます。

要約(オリジナル)

Significant progress has been made in advancing large multimodal conversational models (LMMs), capitalizing on vast repositories of image-text data available online. Despite this progress, these models often encounter substantial domain gaps, hindering their ability to engage in complex conversations across new domains. Recent efforts have aimed to mitigate this issue, albeit relying on domain-specific image-text data to curate instruction-tuning data. However, many domains, such as agriculture, lack such vision-language data. In this work, we propose an approach to construct instruction-tuning data that harnesses vision-only data for the agriculture domain. We utilize diverse agricultural datasets spanning multiple domains, curate class-specific information, and employ large language models (LLMs) to construct an expert-tuning set, resulting in a 70k expert-tuning dataset called AgroInstruct. Subsequently, we expert-tuned and created AgroGPT, an efficient LMM that can hold complex agriculture-related conversations and provide useful insights. We also develop AgroEvals for evaluation and compare {AgroGPT’s} performance with large open and closed-source models. {AgroGPT} excels at identifying fine-grained agricultural concepts, can act as an agriculture expert, and provides helpful information for multimodal agriculture questions. The code, datasets, and models are available at https://github.com/awaisrauf/agroGPT.

arxiv情報

著者	Muhammad Awais,Ali Husain Salem Abdulla Alharthi,Amandeep Kumar,Hisham Cholakkal,Rao Muhammad Anwer
発行日	2025-01-09 18:43:18+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

AgroGPT: Efficient Agricultural Vision-Language Model with Expert Tuning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー