Data-Driven Approach for Formality-Sensitive Machine Translation: Language-Specific Handling and Synthetic Data Generation

要約

このペーパーでは、4 つのターゲット言語の固有の言語特性に対応する、形式依存型機械翻訳 (FSMT) のデータ駆動型アプローチを紹介します。
私たちの方法論は、1) 言語固有のデータ処理、2) 大規模言語モデルと経験的プロンプトエンジニアリングを使用した合成データ生成という 2 つの中核戦略を中心としています。
このアプローチはベースラインに比べて大幅な改善を示しており、データ中心の手法の有効性が強調されています。
当社の迅速なエンジニアリング戦略は、優れた合成翻訳例を生成することでパフォーマンスをさらに向上させます。

要約(オリジナル)

In this paper, we introduce a data-driven approach for Formality-Sensitive Machine Translation (FSMT) that caters to the unique linguistic properties of four target languages. Our methodology centers on two core strategies: 1) language-specific data handling, and 2) synthetic data generation using large-scale language models and empirical prompt engineering. This approach demonstrates a considerable improvement over the baseline, highlighting the effectiveness of data-centric techniques. Our prompt engineering strategy further improves performance by producing superior synthetic translation examples.

arxiv情報

著者	Seugnjun Lee,Hyeonseok Moon,Chanjun Park,Heuiseok Lim
発行日	2023-06-26 08:45:47+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Data-Driven Approach for Formality-Sensitive Machine Translation: Language-Specific Handling and Synthetic Data Generation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー