A BERT-based Unsupervised Grammatical Error Correction Framework

要約

文法エラー訂正 (GEC) は、自然言語処理技術の挑戦的なタスクです。
このアプローチでは、英語や中国語などの普遍的な言語に対してより多くの試みが行われていますが、大規模な注釈付きコーパスがないため、リソースの少ない言語に対しては比較的ほとんど作業が行われていません。
リソースの少ない言語では、言語モデルのスコアリングに基づく現在の教師なし GEC がうまく機能します。
ただし、事前にトレーニングされた言語モデルは、このコンテキストではまだ検討されていません。
この研究では、BERT ベースの教師なし GEC フレームワークを提案します。ここで、GEC はマルチクラス分類タスクと見なされます。
このフレームワークには、データフロー構築モジュール、文章パープレキシティスコアリングモジュール、エラー検出および修正モジュールの 3 つのモジュールが含まれています。
疑似パープレクシティの新しいスコアリング方法を提案して、文の推定正確性を評価し、タガログ語 GEC 研究用のタガログ語コーパスを構築します。
これは、私たちが構築したタガログ語コーパスとオープンソースのインドネシア語コーパスで競争力のあるパフォーマンスを得ており、私たちのフレームワークが低リソース GEC タスクのベースラインメソッドを補完するものであることを示しています。

要約(オリジナル)

Grammatical error correction (GEC) is a challenging task of natural language processing techniques. While more attempts are being made in this approach for universal languages like English or Chinese, relatively little work has been done for low-resource languages for the lack of large annotated corpora. In low-resource languages, the current unsupervised GEC based on language model scoring performs well. However, the pre-trained language model is still to be explored in this context. This study proposes a BERT-based unsupervised GEC framework, where GEC is viewed as multi-class classification task. The framework contains three modules: data flow construction module, sentence perplexity scoring module, and error detecting and correcting module. We propose a novel scoring method for pseudo-perplexity to evaluate a sentence’s probable correctness and construct a Tagalog corpus for Tagalog GEC research. It obtains competitive performance on the Tagalog corpus we construct and open-source Indonesian corpus and it demonstrates that our framework is complementary to baseline method for low-resource GEC task.

arxiv情報

著者	Nankai Lin,Hongbin Zhang,Menglan Shen,Yu Wang,Shengyi Jiang,Aimin Yang
発行日	2023-03-30 13:29:49+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

A BERT-based Unsupervised Grammatical Error Correction Framework

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー