DongbaMIE: A Multimodal Information Extraction Dataset for Evaluating Semantic Understanding of Dongba Pictograms

要約

Dongbaの絵文字は、世界でまだ使用されている唯一の絵文字です。
それらは絵の表彰台の特徴を持ち、そのシンボルには豊かな文化的および文脈情報があります。
関連するデータセットが不足しているため、既存の研究は、ドンバ絵文字の意味的理解の研究を進めることが困難です。
この目的のために、Dongbaの絵文字の意味理解と抽出のための最初のマルチモーダルデータセットであるDongbamieを提案します。
データセットは、Dongbaの絵文字画像と、対応する中国の意味注釈で構成されています。
23,530レベルと2,539の段落レベルの画像が含まれており、オブジェクト、アクション、関係、属性の4つのセマンティックディメンションをカバーしています。
GPT-4O、GEMINI-2.0、およびQWEN2-VLモデルを体系的に評価します。
実験結果は、最適なオブジェクト抽出におけるGPT-4OとGeminiのF1スコアがそれぞれ3.16と3.11であることを示しています。
監視された微調整後のQWEN2-VLのF1スコアはわずか11.49です。
これらの結果は、現在の大規模なマルチモーダルモデルが、Dongbaの絵文字の多様なセマンティック情報を正確に認識する上で依然として重要な課題に直面していることを示唆しています。
データセットはこのURLから取得できます。

要約(オリジナル)

Dongba pictographs are the only pictographs still in use in the world. They have pictorial ideographic features, and their symbols carry rich cultural and contextual information. Due to the lack of relevant datasets, existing research has difficulty in advancing the study of semantic understanding of Dongba pictographs. To this end, we propose DongbaMIE, the first multimodal dataset for semantic understanding and extraction of Dongba pictographs. The dataset consists of Dongba pictograph images and their corresponding Chinese semantic annotations. It contains 23,530 sentence-level and 2,539 paragraph-level images, covering four semantic dimensions: objects, actions, relations, and attributes. We systematically evaluate the GPT-4o, Gemini-2.0, and Qwen2-VL models. Experimental results show that the F1 scores of GPT-4o and Gemini in the best object extraction are only 3.16 and 3.11 respectively. The F1 score of Qwen2-VL after supervised fine-tuning is only 11.49. These results suggest that current large multimodal models still face significant challenges in accurately recognizing the diverse semantic information in Dongba pictographs. The dataset can be obtained from this URL.

arxiv情報

著者	Xiaojun Bi,Shuo Li,Ziyue Wang,Fuwen Luo,Weizheng Qiao,Lu Han,Ziwei Sun,Peng Li,Yang Liu
発行日	2025-03-05 16:20:53+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

DongbaMIE: A Multimodal Information Extraction Dataset for Evaluating Semantic Understanding of Dongba Pictograms

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー