VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models

要約

私たちは、PyTorch に基づいた大規模なマルチモダリティモデルを評価するためのオープンソースツールキットである VLMEvalKit を紹介します。
このツールキットは、研究者や開発者が既存のマルチモダリティモデルを評価し、再現可能な評価結果を公開するための、使いやすく包括的なフレームワークを提供することを目的としています。
VLMEvalKit では、独自の API とオープンソースモデルの両方を含む 70 を超える異なる大規模なマルチモーダルモデルと、20 を超える異なるマルチモーダルベンチマークを実装しています。
単一のインターフェイスを実装することで、新しいモデルをツールキットに簡単に追加できると同時に、ツールキットはデータの準備、分散推論、予測の後処理、メトリクスの計算などの残りのワークロードを自動的に処理します。
このツールキットは現在、主に大規模なビジョン言語モデルを評価するために使用されていますが、その設計は、オーディオやビデオなどの追加モダリティを組み込む将来のアップデートと互換性があります。
ツールキットで得られた評価結果に基づいて、マルチモダリティ学習研究の進捗状況を追跡するための包括的なリーダーボードである OpenVLM Leaderboard をホストしています。
ツールキットは https://github.com/open-compass/VLMEvalKit でリリースされ、積極的にメンテナンスされています。

要約(オリジナル)

We present VLMEvalKit: an open-source toolkit for evaluating large multi-modality models based on PyTorch. The toolkit aims to provide a user-friendly and comprehensive framework for researchers and developers to evaluate existing multi-modality models and publish reproducible evaluation results. In VLMEvalKit, we implement over 70 different large multi-modality models, including both proprietary APIs and open-source models, as well as more than 20 different multi-modal benchmarks. By implementing a single interface, new models can be easily added to the toolkit, while the toolkit automatically handles the remaining workloads, including data preparation, distributed inference, prediction post-processing, and metric calculation. Although the toolkit is currently mainly used for evaluating large vision-language models, its design is compatible with future updates that incorporate additional modalities, such as audio and video. Based on the evaluation results obtained with the toolkit, we host OpenVLM Leaderboard, a comprehensive leaderboard to track the progress of multi-modality learning research. The toolkit is released at https://github.com/open-compass/VLMEvalKit and is actively maintained.

arxiv情報

著者	Haodong Duan,Junming Yang,Yuxuan Qiao,Xinyu Fang,Lin Chen,Yuan Liu,Amit Agarwal,Zhe Chen,Mo Li,Yubo Ma,Hailong Sun,Xiangyu Zhao,Junbo Cui,Xiaoyi Dong,Yuhang Zang,Pan Zhang,Jiaqi Wang,Dahua Lin,Kai Chen
発行日	2024-09-11 17:10:36+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー