Diverse Weight Averaging for Out-of-Distribution Generalization

要約

標準的なニューラルネットワークは、コンピュータービジョンの分布シフトの下で一般化するのに苦労しています。
幸いなことに、複数のネットワークを組み合わせることで、分布外の一般化を一貫して改善できます。
特に、重み平均化 (WA) 戦略は、競合する DomainBed ベンチマークで最高のパフォーマンスを発揮することが示されました。
非線形性にもかかわらず、複数のネットワークの重みを直接平均化します。
このホワイトペーパーでは、平均化されたモデル全体で機能の多様性を高めることを主な動機とする新しいWA戦略である、ダイバースウェイトアベレージング（DiWA）を提案します。
この目的のために、DiWA は複数の独立したトレーニング実行から得られた重みを平均します。実際、ハイパーパラメーターとトレーニング手順の違いにより、異なる実行から得られたモデルは、1 回の実行で収集されたモデルよりも多様です。
WA と標準的な機能アンサンブルの間の類似点を利用して、予想されるエラーの新しいバイアス分散共分散局所分解によって多様性の必要性を動機付けます。
さらに、この分解は、分散項が支配的な場合に WA が成功することを強調しています。これは、テスト時に周辺分布が変化した場合に発生することを示しています。
実験的に、DiWA は推論のオーバーヘッドなしで DomainBed の最新技術を一貫して改善します。

要約(オリジナル)

Standard neural networks struggle to generalize under distribution shifts in computer vision. Fortunately, combining multiple networks can consistently improve out-of-distribution generalization. In particular, weight averaging (WA) strategies were shown to perform best on the competitive DomainBed benchmark; they directly average the weights of multiple networks despite their nonlinearities. In this paper, we propose Diverse Weight Averaging (DiWA), a new WA strategy whose main motivation is to increase the functional diversity across averaged models. To this end, DiWA averages weights obtained from several independent training runs: indeed, models obtained from different runs are more diverse than those collected along a single run thanks to differences in hyperparameters and training procedures. We motivate the need for diversity by a new bias-variance-covariance-locality decomposition of the expected error, exploiting similarities between WA and standard functional ensembling. Moreover, this decomposition highlights that WA succeeds when the variance term dominates, which we show occurs when the marginal distribution changes at test time. Experimentally, DiWA consistently improves the state of the art on DomainBed without inference overhead.

arxiv情報

著者	Alexandre Ramé,Matthieu Kirchmeyer,Thibaud Rahier,Alain Rakotomamonjy,Patrick Gallinari,Matthieu Cord
発行日	2023-01-27 14:21:58+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Diverse Weight Averaging for Out-of-Distribution Generalization

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー