ChartAssisstant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning

要約

チャートは、データの視覚化、データパターンの理解、情報に基づいた意思決定において重要な役割を果たします。
ただし、グラフィック要素 (バー、線など) とテキストコンポーネント (ラベル、凡例など) の独自の組み合わせにより、汎用マルチモーダルモデルには課題が生じます。
チャートデータでトレーニングされたビジョン言語モデルは理解力に優れていますが、一般化が難しく、タスク固有の微調整が必要です。
これらの課題に対処するために、私たちは、普遍的なチャートの理解と推論のためのチャートベースのビジョン言語モデルである ChartAssistant を提案します。
ChartAssistant は、基本的なチャートタイプと特殊なチャートタイプを含むさまざまなチャート関連タスクをカバーする包括的なデータセットである ChartSFT を活用します。
これは 2 段階のトレーニングプロセスを経ます。まず、チャートとテキストを調整するためのチャートからテーブルへの解析に関する事前トレーニングから始まり、その後、マルチタスクの指示に続く微調整が続きます。
このアプローチにより、ChartAssistant は、タスク固有の微調整を行わなくても、さまざまなチャートタスクにわたって競争力のあるパフォーマンスを達成できます。
実験結果では、最先端の UniChart メソッドよりもパフォーマンスが大幅に向上し、実世界のチャートデータで OpenAI の GPT-4V(ision) を上回るパフォーマンスが実証されました。
コードとデータは https://github.com/OpenGVLab/ChartAst で入手できます。

要約(オリジナル)

Charts play a vital role in data visualization, understanding data patterns, and informed decision-making. However, their unique combination of graphical elements (e.g., bars, lines) and textual components (e.g., labels, legends) poses challenges for general-purpose multimodal models. While vision-language models trained on chart data excel in comprehension, they struggle with generalization and require task-specific fine-tuning. To address these challenges, we propose ChartAssistant, a chart-based vision-language model for universal chart comprehension and reasoning. ChartAssistant leverages ChartSFT, a comprehensive dataset covering diverse chart-related tasks with basic and specialized chart types. It undergoes a two-stage training process, starting with pre-training on chart-to-table parsing to align chart and text, followed by multitask instruction-following fine-tuning. This approach enables ChartAssistant to achieve competitive performance across various chart tasks without task-specific fine-tuning. Experimental results demonstrate significant performance gains over the state-of-the-art UniChart method, outperforming OpenAI’s GPT-4V(ision) on real-world chart data. The code and data are available at https://github.com/OpenGVLab/ChartAst.

arxiv情報

著者	Fanqing Meng,Wenqi Shao,Quanfeng Lu,Peng Gao,Kaipeng Zhang,Yu Qiao,Ping Luo
発行日	2024-01-10 16:27:57+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

ChartAssisstant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー