Low-Rank and Sparse Model Merging for Multi-Lingual Speech Recognition and Translation

要約

言語の多様性は、自動音声認識や翻訳など、音声からテキスト（S2T）タスクに大きな課題を提示します。
従来のマルチタスクトレーニングアプローチは、さまざまな言語で複数の音声認識と翻訳タスクを共同で最適化することにより、これに対処することを目的としています。
これらの戦略に基づいて構築されたささやきのようなモデルは、強力なパフォーマンスを示していますが、高い計算コスト、言語干渉、最適ではないトレーニング構成、および限られた拡張性の問題に依然として直面しています。
これらの課題を克服するために、パフォーマンスを維持し、計算オーバーヘッドを削減しながら、さまざまな言語やタスクでトレーニングされたモデルを効率的に統合するように設計された新しい手法である、Lors-Merging（低ランクおよびスパースモデルのマージ）を紹介します。
Lors-Mergingは、低ランクとまばらな剪定を組み合わせて、冗長なパラメーターを排除し、言語とタスクの干渉を緩和し、拡張性を向上させながら、必須構造を保持します。
さまざまな言語にわたる実験結果は、ローマーが従来のマルチリングのマルチタスクトレーニングベースラインを大幅に上回ることを示しています。
我々の調査結果は、モデルのマージ、特にローマーマザーが、S2Tアプリケーションの従来の多言語トレーニング戦略をスケーラブルで効果的な補完であることを示唆しています。

要約(オリジナル)

Language diversity presents a significant challenge in speech-to-text (S2T) tasks, such as automatic speech recognition and translation. Traditional multi-task training approaches aim to address this by jointly optimizing multiple speech recognition and translation tasks across various languages. While models like Whisper, built on these strategies, demonstrate strong performance, they still face issues of high computational cost, language interference, suboptimal training configurations, and limited extensibility. To overcome these challenges, we introduce LoRS-Merging (low-rank and sparse model merging), a novel technique designed to efficiently integrate models trained on different languages or tasks while preserving performance and reducing computational overhead. LoRS-Merging combines low-rank and sparse pruning to retain essential structures while eliminating redundant parameters, mitigating language and task interference, and enhancing extensibility. Experimental results across a range of languages demonstrate that LoRS-Merging significantly outperforms conventional multi-lingual multi-task training baselines. Our findings suggest that model merging, particularly LoRS-Merging, is a scalable and effective complement to traditional multi-lingual training strategies for S2T applications.

arxiv情報

著者	Qiuming Zhao,Guangzhi Sun,Chao Zhang,Mingxing Xu,Thomas Fang Zheng
発行日	2025-02-24 18:06:57+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Low-Rank and Sparse Model Merging for Multi-Lingual Speech Recognition and Translation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー