Efficient CTC Regularization via Coarse Labels for End-to-End Speech Translation

要約

エンドツーエンドの音声翻訳の場合、ソーストランスクリプトまたはターゲット翻訳をラベルとして使用して、コネクショニスト時間分類 (CTC) 目的でエンコーダーを正則化すると、品質指標が大幅に向上します。
ただし、CTC はボキャブラリスペース上に追加の予測レイヤーを要求し、無視できないモデルパラメーターと計算オーバーヘッドをもたらしますが、このレイヤーは通常、推論には使用されません。
このホワイトペーパーでは、正則化のために CTC の本物の語彙ラベルの必要性を再検討し、CTC ラベルスペースを削減する戦略を探り、品質を低下させることなく効率を改善することを目標としています。
CTC (CoLaCTC) の粗いラベル付けを提案します。これは、切り捨て、除算、モジュロ (MOD) 操作などの単純なヒューリスティックルールを介して語彙ラベルをマージします。
その単純さにもかかわらず、4 つのソース言語と 8 つのターゲット言語での実験では、特に MOD を使用した CoLaCTC は、ラベルスペースを積極的に 256 に圧縮し、さらにトレーニング効率 (元の語彙サイズに応じて 1.18x ~ 1.77x のスピードアップ) を得ることができることを示しています。
CTC ベースラインと同等またはそれ以上のパフォーマンスを提供します。
また、ラベリングにトランスクリプトまたは翻訳を使用する場合でも、CoLaCTC が CTC 正則化に正常に一般化されることを示します。

要約(オリジナル)

For end-to-end speech translation, regularizing the encoder with the Connectionist Temporal Classification (CTC) objective using the source transcript or target translation as labels can greatly improve quality metrics. However, CTC demands an extra prediction layer over the vocabulary space, bringing in nonnegligible model parameters and computational overheads, although this layer is typically not used for inference. In this paper, we re-examine the need for genuine vocabulary labels for CTC for regularization and explore strategies to reduce the CTC label space, targeting improved efficiency without quality degradation. We propose coarse labeling for CTC (CoLaCTC), which merges vocabulary labels via simple heuristic rules, such as using truncation, division or modulo (MOD) operations. Despite its simplicity, our experiments on 4 source and 8 target languages show that CoLaCTC with MOD particularly can compress the label space aggressively to 256 and even further, gaining training efficiency (1.18x ~ 1.77x speedup depending on the original vocabulary size) yet still delivering comparable or better performance than the CTC baseline. We also show that CoLaCTC successfully generalizes to CTC regularization regardless of using transcript or translation for labeling.

arxiv情報

著者	Biao Zhang,Barry Haddow,Rico Sennrich
発行日	2023-02-21 18:38:41+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Efficient CTC Regularization via Coarse Labels for End-to-End Speech Translation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー