Federated Document Visual Question Answering: A Pilot Study

要約

文書分析研究の重要なハンディキャップは、文書が著作権で保護されているか、個人情報が含まれている傾向があるため、文書をオープンに公開したり、一元化された大規模な文書データセットを作成したりすることができないことです。
代わりに、ドキュメントがプライベートデータサイロに分散されているため、異種データに対する広範なトレーニングが退屈な作業になります。
この研究では、分散されたプライベート文書データで共有モデルをトレーニングする方法として、フェデレーテッドラーニング (FL) スキームの使用を検討します。
モデルに必要な推論能力のタイプはさまざまなドメインでまったく異なる可能性があるため、このアプローチに特に適したタスクであるドキュメント VQA の問題に焦点を当てます。
したがって、異種のドキュメントデータセットに対するトレーニングを有効にすると、DocVQA モデルを大幅に強化できます。
私たちは、現実世界のアプリケーションにおけるデータの異質性を反映するために、さまざまなドメインから既存の DocVQA データセットを組み立てます。
このマルチモーダル設定での自己事前トレーニング手法を検討します。この手法では、同じデータが事前トレーニングと微調整の両方に使用され、プライバシー保護に関連します。
さらに、FedAvg ベースラインを上回る集中適応最適化を使用して、自己事前トレーニングと Federated DocVQA トレーニング方法を組み合わせることを提案します。
広範な実験により、FL を使用した DocVQA モデルのトレーニングに関する多面的な分析も提示します。これは、このタスクに関する将来の研究のための洞察を提供します。
私たちの事前トレーニング戦略は、さまざまな DocVQA データセットを使用したフェデレーショントレーニングの下で効果的に学習してスケールアップできることを示します。フェデレーション下での実際のドキュメントタスクにはハイパーパラメーターの調整が不可欠です。

要約(オリジナル)

An important handicap of document analysis research is that documents tend to be copyrighted or contain private information, which prohibits their open publication and the creation of centralised, large-scale document datasets. Instead, documents are scattered in private data silos, making extensive training over heterogeneous data a tedious task. In this work, we explore the use of a federated learning (FL) scheme as a way to train a shared model on decentralised private document data. We focus on the problem of Document VQA, a task particularly suited to this approach, as the type of reasoning capabilities required from the model can be quite different in diverse domains. Enabling training over heterogeneous document datasets can thus substantially enrich DocVQA models. We assemble existing DocVQA datasets from diverse domains to reflect the data heterogeneity in real-world applications. We explore the self-pretraining technique in this multi-modal setting, where the same data is used for both pretraining and finetuning, making it relevant for privacy preservation. We further propose combining self-pretraining with a Federated DocVQA training method using centralized adaptive optimization that outperforms the FedAvg baseline. With extensive experiments, we also present a multi-faceted analysis on training DocVQA models with FL, which provides insights for future research on this task. We show that our pretraining strategies can effectively learn and scale up under federated training with diverse DocVQA datasets and tuning hyperparameters is essential for practical document tasks under federation.

arxiv情報

著者	Khanh Nguyen,Dimosthenis Karatzas
発行日	2024-05-10 17:53:05+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Federated Document Visual Question Answering: A Pilot Study

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー