Personalised Language Modelling of Screen Characters Using Rich Metadata Annotations


この作業では、人気のあるコーネル ムービー ダイアログ コーパスから 863 人のスピーカーの新しい手動注釈セットをリリースし、説明します。これには、特徴的な引用や登場人物の説明などの機能と、注目の映画の 95% 以上に対して自動的に抽出された 6 つのメタデータのセットが含まれます。
2 つのコーパスで大規模な実験を行い、そのような注釈を効果的に使用して言語モデルをパーソナライズし、困惑を最大 8.5% 削減できることを示しました。


Personalisation of language models for dialogue sensitises them to better capture the speaking patterns of people of specific characteristics, and/or in specific environments. However, rich character annotations are difficult to come by and to successfully leverage. In this work, we release and describe a novel set of manual annotations for 863 speakers from the popular Cornell Movie Dialog Corpus, including features like characteristic quotes and character descriptions, and a set of six automatically extracted metadata for over 95% of the featured films. We perform extensive experiments on two corpora and show that such annotations can be effectively used to personalise language models, reducing perplexity by up to 8.5%. Our method can be applied even zero-shot for speakers for whom no prior training data is available, by relying on combinations of characters’ demographic characteristics. Since collecting such metadata is costly, we also contribute a cost-benefit analysis to highlight which annotations were most cost-effective relative to the reduction in perplexity.


著者 Sebastian Vincent,Rowanne Sumner,Alice Dowek,Charlotte Blundell,Emily Preston,Chris Bayliss,Chris Oakley,Carolina Scarton
発行日 2023-03-29 12:19:23+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス, Google

カテゴリー: cs.AI, cs.CL, cs.LG パーマリンク