Assessing the Role of Lexical Semantics in Cross-lingual Transfer through Controlled Manipulations


文字や語順などのプロパティはアライメントの品質に限定的な影響しか与えませんが、翻訳エントロピーの尺度を使用して定義した 2 つの言語間の語彙一致の程度は、アライメントの品質に大きな影響を与えることを示します。


While cross-linguistic model transfer is effective in many settings, there is still limited understanding of the conditions under which it works. In this paper, we focus on assessing the role of lexical semantics in cross-lingual transfer, as we compare its impact to that of other language properties. Examining each language property individually, we systematically analyze how differences between English and a target language influence the capacity to align the language with an English pretrained representation space. We do so by artificially manipulating the English sentences in ways that mimic specific characteristics of the target language, and reporting the effect of each manipulation on the quality of alignment with the representation space. We show that while properties such as the script or word order only have a limited impact on alignment quality, the degree of lexical matching between the two languages, which we define using a measure of translation entropy, greatly affects it.


著者 Roy Ilani,Taelin Karidi,Omri Abend
発行日 2024-08-14 14:59:20+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス, Google

カテゴリー: cs.CL パーマリンク