Training with Pseudo-Code for Instruction Following

要約

大規模な言語モデル（LLM）の能力の急速な進歩にもかかわらず、特に構成が関与している場合、比較的単純で明確な指示に従って困難を抱えています。
この論文では、モデルが擬似コードで表現されているときに指示に従うことができることを示唆する最近の研究からインスピレーションを得ています。
ただし、疑似コードプログラムを書くことは退屈であり、推論で使用するためにコード表現を作成するために少数のショットデモンストレーションを使用することは、LLMSの非専門家にとって不自然になる可能性があります。
これらの制限を克服するために、最終的な応答とともに擬似コードで再発現した命令を追加する命令調整データを備えた微調整LLMを提案します。
指導、数学、および常識的な推論に関連するタスクで構成される11ドル$ $ $ $ $ $で利用可能なベンチマークでトレーニングされたモデルを評価します。
5ドルの異なるモデルで厳格な実験を実施し、モデルが擬似コードで訓練されたときに指示に従うだけでなく、数学的および常識推論に関連する他のタスクに能力を保持していることがわかります。
具体的には、3ドルの相対的な利益（命令に従ったベンチマークで19ドル）の相対的なゲインと、すべてのタスクで最大14％の平均ゲインが観察されます。

要約(オリジナル)

Despite the rapid progress in the capabilities of Large Language Models (LLMs), they continue to have difficulty following relatively simple, unambiguous instructions, especially when compositions are involved. In this paper, we take inspiration from recent work that suggests that models may follow instructions better when they are expressed in pseudo-code. However, writing pseudo-code programs can be tedious and using few-shot demonstrations to craft code representations for use in inference can be unnatural for non-expert users of LLMs. To overcome these limitations, we propose fine-tuning LLMs with instruction-tuning data that additionally includes instructions re-expressed in pseudo-code along with the final response. We evaluate models trained using our method on $11$ publicly available benchmarks comprising of tasks related to instruction-following, mathematics, and common-sense reasoning. We conduct rigorous experiments with $5$ different models and find that not only do models follow instructions better when trained with pseudo-code, they also retain their capabilities on the other tasks related to mathematical and common sense reasoning. Specifically, we observe a relative gain of $3$–$19$% on instruction-following benchmark, and an average gain of upto 14% across all tasks.

arxiv情報

著者	Prince Kumar,Rudra Murthy,Riyaz Bhat,Danish Contractor
発行日	2025-05-23 15:14:29+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Training with Pseudo-Code for Instruction Following

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー