FlowerTune: A Cross-Domain Benchmark for Federated Fine-Tuning of Large Language Models

要約

大規模言語モデル(LLM)は様々な領域で最先端の結果を達成しているが、その開発は依然として膨大な量の一般公開データに依存しており、データ不足や領域固有の機密情報へのアクセス不足が懸念されている。Federated Learning (FL)は、生データを共有することなく、事前に訓練されたLLMの分散化された微調整を可能にすることで、これらの課題に対処するための魅力的なフレームワークを提示する。しかし、FL設定における事前学習済みLLMの互換性と性能は、まだほとんど検討されていない。我々は、一般的なNLP、金融、医療、コーディングの4つの多様なドメインにおけるLLMの連携微調整を評価するために設計された、初のベンチマークスイートであるFlowerTune LLM Leaderboardを紹介する。各ドメインには、連携された命令チューニングデータセットとドメイン固有の評価指標が含まれています。我々の結果は、共同、オープンソース、コミュニティ主導のアプローチによって得られたものであり、フェデレートされた設定の下で、異なる集約と微調整戦略を持つ26の事前訓練されたLLMの包括的な比較を初めて提供し、モデルの性能、リソースの制約、ドメイン適応に関する実用的な洞察を提供する。この研究は、実世界のアプリケーションのために、プライバシーを保護し、ドメインに特化したLLMを開発するための基礎を築くものである。

要約(オリジナル)

Large Language Models (LLMs) have achieved state-of-the-art results across diverse domains, yet their development remains reliant on vast amounts of publicly available data, raising concerns about data scarcity and the lack of access to domain-specific, sensitive information. Federated Learning (FL) presents a compelling framework to address these challenges by enabling decentralized fine-tuning on pre-trained LLMs without sharing raw data. However, the compatibility and performance of pre-trained LLMs in FL settings remain largely under explored. We introduce the FlowerTune LLM Leaderboard, a first-of-its-kind benchmarking suite designed to evaluate federated fine-tuning of LLMs across four diverse domains: general NLP, finance, medical, and coding. Each domain includes federated instruction-tuning datasets and domain-specific evaluation metrics. Our results, obtained through a collaborative, open-source and community-driven approach, provide the first comprehensive comparison across 26 pre-trained LLMs with different aggregation and fine-tuning strategies under federated settings, offering actionable insights into model performance, resource constraints, and domain adaptation. This work lays the foundation for developing privacy-preserving, domain-specialized LLMs for real-world applications.

arxiv情報

著者	Yan Gao,Massimo Roberto Scamarcia,Javier Fernandez-Marques,Mohammad Naseri,Chong Shen Ng,Dimitris Stripelis,Zexi Li,Tao Shen,Jiamu Bai,Daoyuan Chen,Zikai Zhang,Rui Hu,InSeo Song,Lee KangYoon,Hong Jia,Ting Dang,Junyan Wang,Zheyuan Liu,Daniel Janes Beutel,Lingjuan Lyu,Nicholas D. Lane
発行日	2025-06-03 14:54:12+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

FlowerTune: A Cross-Domain Benchmark for Federated Fine-Tuning of Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー