PTVD: A Large-Scale Plot-Oriented Multimodal Dataset Based on Television Dramas

要約

映画やテレビドラマなどの芸術形式は現実世界を反映しており、最近マルチモーダル学習コミュニティから大きな注目を集めています。
ただし、この領域の既存のコーパスには 3 つの制限があります。(1) シーン指向の方法で注釈が付けられており、プロット内の一貫性を無視します。
(2) 彼らの文章には共感が欠けており、状況の文脈についてはほとんど言及されていません。
(3) 彼らのビデオクリップは時間が短いため、長期的な関係をカバーできません。
これらの基本的な問題に対処するために、専門家によって書かれた 1,106 の TV ドラマエピソードと 24,875 の有益なプロットに重点を置いた文章を使用し、449 人のヒューマンアノテーターの協力を得て、TV 分野初のプロット指向のマルチモーダルデータセットである PTVD を構築しました。
これは、この種の最初の英語以外のデータセットでもあります。
さらに、PTVD には 2,600 万を超えるブレットスクリーンコメント (BSC) が含まれており、大規模な事前トレーニングを強化します。
次に、フォローアップ作品のための強力なベースラインをオープンソース化することを目的として、統一されたアーキテクチャで映画/TV モデリングのさまざまな問題に取り組むマルチモーダルアルゴリズムを開発しました。
認知に触発された 3 つのタスクに関する広範な実験により、多くの新しい観察結果が得られ (その一部は直感に反するものもありました)、マルチモーダル研究の促進における PTVD の価値がさらに検証されました。
データセットとコードは \url{https://ptvd.github.io/} でリリースされています。

要約(オリジナル)

Art forms such as movies and television (TV) dramas are reflections of the real world, which have attracted much attention from the multimodal learning community recently. However, existing corpora in this domain share three limitations: (1) annotated in a scene-oriented fashion, they ignore the coherence within plots; (2) their text lacks empathy and seldom mentions situational context; (3) their video clips fail to cover long-form relationship due to short duration. To address these fundamental issues, using 1,106 TV drama episodes and 24,875 informative plot-focused sentences written by professionals, with the help of 449 human annotators, we constructed PTVD, the first plot-oriented multimodal dataset in the TV domain. It is also the first non-English dataset of its kind. Additionally, PTVD contains more than 26 million bullet screen comments (BSCs), powering large-scale pre-training. Next, aiming to open-source a strong baseline for follow-up works, we developed the multimodal algorithm that attacks different cinema/TV modelling problems with a unified architecture. Extensive experiments on three cognitive-inspired tasks yielded a number of novel observations (some of them being quite counter-intuition), further validating the value of PTVD in promoting multimodal research. The dataset and codes are released at \url{https://ptvd.github.io/}.

arxiv情報

著者	Chen Li,Xutan Peng,Teng Wang,Yixiao Ge,Mengyang Liu,Xuyuan Xu,Yexin Wang,Ying Shan
発行日	2023-06-26 12:30:20+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

PTVD: A Large-Scale Plot-Oriented Multimodal Dataset Based on Television Dramas

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー