Assessing Phrase Break of ESL Speech with Pre-trained Language Models and Large Language Models

要約

この研究では、事前トレーニング済み言語モデル (PLM) と大規模言語モデル (LLM) を使用して、ESL 学習者の音声におけるフレーズの区切りを評価するアプローチを紹介します。
タスクは 2 つあります。スピーチクリップのフレーズ区切りの全体的な評価と、考えられるすべてのフレーズ区切り位置の詳細な評価です。
NLP モデルを活用するために、音声入力はまずテキストと強制的に調整され、次に単語やフレーズ区切り情報を含むトークンシーケンスに前処理されます。
PLM を利用するために、処理されたトークンを使用した事前トレーニングと微調整パイプラインを提案します。
このプロセスには、置き換えられたブレークトークン検出モジュールによる事前トレーニングと、テキスト分類とシーケンスのラベル付けによる微調整が含まれます。
LLM を採用するために、ChatGPT 用のプロンプトを設計します。
実験の結果、PLM を使用すると、ラベル付きトレーニングデータへの依存が大幅に軽減され、パフォーマンスが向上することがわかりました。
一方、有名な LLM である ChatGPT には、この分野でさらなる進歩の可能性があることが確認されています。

要約(オリジナル)

This work introduces approaches to assessing phrase breaks in ESL learners’ speech using pre-trained language models (PLMs) and large language models (LLMs). There are two tasks: overall assessment of phrase break for a speech clip and fine-grained assessment of every possible phrase break position. To leverage NLP models, speech input is first force-aligned with texts, and then pre-processed into a token sequence, including words and phrase break information. To utilize PLMs, we propose a pre-training and fine-tuning pipeline with the processed tokens. This process includes pre-training with a replaced break token detection module and fine-tuning with text classification and sequence labeling. To employ LLMs, we design prompts for ChatGPT. The experiments show that with the PLMs, the dependence on labeled training data has been greatly reduced, and the performance has improved. Meanwhile, we verify that ChatGPT, a renowned LLM, has potential for further advancement in this area.

arxiv情報

著者	Zhiyi Wang,Shaoguang Mao,Wenshan Wu,Yan Xia,Yan Deng,Jonathan Tien
発行日	2023-06-08 07:10:39+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Assessing Phrase Break of ESL Speech with Pre-trained Language Models and Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー