Agent Skill Acquisition for Large Language Models via CycleQD

要約

大規模な言語モデルをトレーニングして特定のスキルを習得することは、依然として困難な取り組みです。
従来のトレーニングアプローチでは、データ分散の不均衡や、タスク固有のパフォーマンスとうまく整合しない目的関数の不備に悩まされることがよくあります。
これらの課題に対処するために、モデルマージベースのクロスオーバーと SVD ベースの突然変異とともに、アルゴリズムの周期的適応を通じて品質ダイバーシティフレームワークを活用する新しいアプローチである CycleQD を導入します。
CycleQD では、各タスクのパフォーマンス指標が品質指標として交互に使用され、その他の指標は動作特性として機能します。
このように個々のタスクに周期的に焦点を当てることで、一度に 1 つのタスクに集中して取り組むことができ、データ比率の調整の必要性がなくなり、目的関数の設計が簡素化されます。
AgentBench の実証結果は、CycleQD を LLAMA3-8B-INSTRUCT ベースのモデルに適用すると、コーディング、オペレーティングシステム、データベースタスクにおける従来の微調整方法を超えるだけでなく、GPT-3.5-TURBO と同等のパフォーマンスを達成できることを示しています。
これらのドメイン全体でさらに多くのパラメータが含まれる可能性があります。
重要なのは、この強化されたパフォーマンスは、広く採用されている言語ベンチマークタスクでのパフォーマンスによって証明されているように、堅牢な言語機能を維持しながら達成されることです。
CycleQD の主要な設計上の選択に焦点を当て、それらがその有効性にどのように寄与するかを詳しく説明します。
さらに、私たちの方法は一般的であり、画像セグメンテーションモデルに適用でき、さまざまなドメインにわたる適用可能性を強調しています。

要約(オリジナル)

Training large language models to acquire specific skills remains a challenging endeavor. Conventional training approaches often struggle with data distribution imbalances and inadequacies in objective functions that do not align well with task-specific performance. To address these challenges, we introduce CycleQD, a novel approach that leverages the Quality Diversity framework through a cyclic adaptation of the algorithm, along with a model merging based crossover and an SVD-based mutation. In CycleQD, each task’s performance metric is alternated as the quality measure while the others serve as the behavioral characteristics. This cyclic focus on individual tasks allows for concentrated effort on one task at a time, eliminating the need for data ratio tuning and simplifying the design of the objective function. Empirical results from AgentBench indicate that applying CycleQD to LLAMA3-8B-INSTRUCT based models not only enables them to surpass traditional fine-tuning methods in coding, operating systems, and database tasks, but also achieves performance on par with GPT-3.5-TURBO, which potentially contains much more parameters, across these domains. Crucially, this enhanced performance is achieved while retaining robust language capabilities, as evidenced by its performance on widely adopted language benchmark tasks. We highlight the key design choices in CycleQD, detailing how these contribute to its effectiveness. Furthermore, our method is general and can be applied to image segmentation models, highlighting its applicability across different domains.

arxiv情報

著者	So Kuroki,Taishi Nakamura,Takuya Akiba,Yujin Tang
発行日	2024-11-27 16:38:33+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Agent Skill Acquisition for Large Language Models via CycleQD

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー