Two-stage Pipeline for Multilingual Dialect Detection

要約

方言の識別は、さまざまな大規模言語モデルをローカライズするための重要なタスクです。
このホワイトペーパーでは、VarDial 2023 共有タスクへのアプローチについて概説します。
ここでは、それぞれ 3 つの言語から 3 つまたは 2 つの方言を特定する必要があり、それぞれトラック 1 の 9 通りの分類とトラック 2 の 6 通りの分類になります。
私たちの提案するアプローチは、2 段階のシステムで構成され、他の参加者のシステムやこのドメインでの以前の作業よりも優れています。
Track-1 で 58.54%、Track-2 で 85.61% のスコアを達成しています。
コードベースは公開されています (https://github.com/ankit-vaidya19/EACL_VarDial2023)。

要約(オリジナル)

Dialect Identification is a crucial task for localizing various Large Language Models. This paper outlines our approach to the VarDial 2023 shared task. Here we have to identify three or two dialects from three languages each which results in a 9-way classification for Track-1 and 6-way classification for Track-2 respectively. Our proposed approach consists of a two-stage system and outperforms other participants’ systems and previous works in this domain. We achieve a score of 58.54% for Track-1 and 85.61% for Track-2. Our codebase is available publicly (https://github.com/ankit-vaidya19/EACL_VarDial2023).

arxiv情報

著者	Ankit Vaidya,Aditya Kane
発行日	2023-03-06 20:35:51+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Two-stage Pipeline for Multilingual Dialect Detection

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー