Manga109Dialog A Large-scale Dialogue Dataset for Comics Speaker Detection

要約

電子コミック市場の拡大により、コミックを分析する自動化手法の開発への関心が高まっています。
漫画をさらに理解するには、漫画内のテキストとその単語を話している登場人物をリンクさせる自動化されたアプローチが必要です。
漫画話者検出の研究は、オーディオブックの自動登場人物割り当て、登場人物の性格に応じた自動翻訳、登場人物の関係性やストーリーの推論など、実用的な応用が可能です。
話者からテキストへの注釈が不十分であるという問題に対処するために、manga109 に基づいて新しい注釈データセット Manga109Dialog を作成しました。
Manga109Dialog は、132,692 の話者とテキストのペアを含む世界最大の漫画話者注釈データセットです。
話者検出方法をより適切に評価するために、予測難易度によってデータセットをさらに異なるレベルに分割しました。
主に距離に基づく既存の手法とは異なり、シーングラフ生成モデルを使用した深層学習ベースの手法を提案します。
漫画の特性上、コマの読み出し順序を考慮することで提案モデルの性能を向上させます。
Manga109Dialog やその他のデータセットを使用して実験を行いました。
実験結果は、シーングラフベースのアプローチが既存の方法よりも優れており、75% 以上の予測精度を達成していることを示しています。

要約(オリジナル)

The expanding market for e-comics has spurred interest in the development of automated methods to analyze comics. For further understanding of comics, an automated approach is needed to link text in comics to characters speaking the words. Comics speaker detection research has practical applications, such as automatic character assignment for audiobooks, automatic translation according to characters’ personalities, and inference of character relationships and stories. To deal with the problem of insufficient speaker-to-text annotations, we created a new annotation dataset Manga109Dialog based on Manga109. Manga109Dialog is the world’s largest comics speaker annotation dataset, containing 132,692 speaker-to-text pairs. We further divided our dataset into different levels by prediction difficulties to evaluate speaker detection methods more appropriately. Unlike existing methods mainly based on distances, we propose a deep learning-based method using scene graph generation models. Due to the unique features of comics, we enhance the performance of our proposed model by considering the frame reading order. We conducted experiments using Manga109Dialog and other datasets. Experimental results demonstrate that our scene-graph-based approach outperforms existing methods, achieving a prediction accuracy of over 75%.

arxiv情報

著者	Yingxuan Li,Kiyoharu Aizawa,Yusuke Matsui
発行日	2023-06-30 08:34:08+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Manga109Dialog A Large-scale Dialogue Dataset for Comics Speaker Detection

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー