FAST: Federated Active Learning with Foundation Models for Communication-efficient Sampling and Training

要約

Federated Active Learning（FAL）は、データのプライバシーを維持しながら、分散クライアント全体で大量の非標識データを活用する有望な枠組みとして浮上しています。
ただし、クライアントがかなりのローカルデータセットを所有している場合、特にシロ設定では、高い注釈コストとコミュニケーション集約型のサンプリングプロセスにより、現実世界の展開は引き続き制限されています。
このペーパーでは、重要な質問に取り組んでいます。アノテーターの努力を最小限に抑えながら、人間の学習のコミュニケーションコストを削減するためのベストプラクティスは何ですか？
既存のFALメソッドは、通常、フェデレーションアップデートからアクティブなサンプリングを分離する反復注釈プロセスに依存しており、複数の高価なコミュニケーションと注釈につながります。
これに応じて、予備パスでの弱いラベル付けの基礎モデルを活用する2パスFALフレームワークであるFastを紹介し、その後、最も不確実なサンプルのみに焦点を当てた改良パスが続きます。
基礎モデルからの表現知識を活用し、精製ステップを合理化されたワークフローに統合することにより、反復的なアクティブサンプリングによって発生するオーバーヘッドを大幅に減少させます。
多様な医療および自然画像のベンチマークに関する広範な実験は、限られた5％のラベル付け予算の下で8倍の通信ラウンドを減らしながら、既存のFAL方法を平均4.36％上回ることを示しています。

要約(オリジナル)

Federated Active Learning (FAL) has emerged as a promising framework to leverage large quantities of unlabeled data across distributed clients while preserving data privacy. However, real-world deployments remain limited by high annotation costs and communication-intensive sampling processes, particularly in a cross-silo setting, when clients possess substantial local datasets. This paper addresses the crucial question: What is the best practice to reduce communication costs in human-in-the-loop learning with minimal annotator effort? Existing FAL methods typically rely on iterative annotation processes that separate active sampling from federated updates, leading to multiple rounds of expensive communication and annotation. In response, we introduce FAST, a two-pass FAL framework that harnesses foundation models for weak labeling in a preliminary pass, followed by a refinement pass focused exclusively on the most uncertain samples. By leveraging representation knowledge from foundation models and integrating refinement steps into a streamlined workflow, FAST substantially reduces the overhead incurred by iterative active sampling. Extensive experiments on diverse medical and natural image benchmarks demonstrate that FAST outperforms existing FAL methods by an average of 4.36% while reducing communication rounds eightfold under a limited 5% labeling budget.

arxiv情報

著者	Haoyuan Li,Mathias Funk,Jindong Wang,Aaqib Saeed
発行日	2025-04-10 14:42:57+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

FAST: Federated Active Learning with Foundation Models for Communication-efficient Sampling and Training

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー