A Grounded Typology of Word Classes


私たちの尺度は、言語間の関数型 (文法) クラスと語彙型 (内容) クラス間の内容の非対称性を捉えていますが、関数型クラスは内容を伝えないという見解とは矛盾しています。
さらに、根拠の階層(例、名詞 > 形容詞 > 動詞)に普遍的な傾向が見出され、私たちの尺度が英語における心理言語学的具体性の規範と部分的に相関していることを示します。
30 言語のグラウンディング スコアのデータセットをリリースします。


We propose a grounded approach to meaning in language typology. We treat data from perceptual modalities, such as images, as a language-agnostic representation of meaning. Hence, we can quantify the function–form relationship between images and captions across languages. Inspired by information theory, we define ‘groundedness’, an empirical measure of contextual semantic contentfulness (formulated as a difference in surprisal) which can be computed with multilingual multimodal language models. As a proof of concept, we apply this measure to the typology of word classes. Our measure captures the contentfulness asymmetry between functional (grammatical) and lexical (content) classes across languages, but contradicts the view that functional classes do not convey content. Moreover, we find universal trends in the hierarchy of groundedness (e.g., nouns > adjectives > verbs), and show that our measure partly correlates with psycholinguistic concreteness norms in English. We release a dataset of groundedness scores for 30 languages. Our results suggest that the grounded typology approach can provide quantitative evidence about semantic function in language.


著者 Coleman Haley,Sharon Goldwater,Edoardo Ponti
発行日 2024-12-13 18:58:48+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CL, cs.CV パーマリンク