Crosslingual Reasoning through Test-Time Scaling

要約

大規模な言語モデルの推論能力は、前提条件のモデルが多言語である場合でも、主に英語のために研究されています。
この作業では、長い考え方（COTS）で微調整されている英語の推論が言語間で一般化できる程度まで調査します。
まず、英語中心の推論言語モデル（RLM）のスケールアップ推論計算により、低リソース言語を含む多くの言語で多言語の数学的推論が改善され、モデルが2倍のサイズを上回る程度まで改善されることがわかります。
第二に、英語中心のRLMのコットは自然に主に英語であるが、引用されている英語以外の入力について推論するために、見積と思考のパターンに一貫して続くことを明らかにします。
第三に、長いCOTの推論の言語を制御する効果的な戦略を発見し、高リソース言語でモデルがより良く効率的に推論することを観察します。
最後に、特に英語でさえ、STEMから文化的常識的な知識まで、ドメイン外の貧弱な推論の一般化を観察します。
全体として、潜在性を示し、メカニズムを研究し、英語の推論テスト時間スケーリングの交差的一般化の制限を概説します。
実務家は、高リソース言語で英語中心のRLMSを推論する必要があると結論付けていますが、低リソース言語とドメイン外のコンテキストでの推論を改善するためのさらなる作業が必要です。

要約(オリジナル)

Reasoning capabilities of large language models are primarily studied for English, even when pretrained models are multilingual. In this work, we investigate to what extent English reasoning finetuning with long chain-of-thoughts (CoTs) can generalize across languages. First, we find that scaling up inference compute for English-centric reasoning language models (RLMs) improves multilingual mathematical reasoning across many languages including low-resource languages, to an extent where they outperform models twice their size. Second, we reveal that while English-centric RLM’s CoTs are naturally predominantly English, they consistently follow a quote-and-think pattern to reason about quoted non-English inputs. Third, we discover an effective strategy to control the language of long CoT reasoning, and we observe that models reason better and more efficiently in high-resource languages. Finally, we observe poor out-of-domain reasoning generalization, in particular from STEM to cultural commonsense knowledge, even for English. Overall, we demonstrate the potentials, study the mechanisms and outline the limitations of crosslingual generalization of English reasoning test-time scaling. We conclude that practitioners should let English-centric RLMs reason in high-resource languages, while further work is needed to improve reasoning in low-resource languages and out-of-domain contexts.

arxiv情報

著者	Zheng-Xin Yong,M. Farid Adilazuarda,Jonibek Mansurov,Ruochen Zhang,Niklas Muennighoff,Carsten Eickhoff,Genta Indra Winata,Julia Kreutzer,Stephen H. Bach,Alham Fikri Aji
発行日	2025-05-08 16:50:06+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Crosslingual Reasoning through Test-Time Scaling

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー