Does equivariance matter at scale?

要約

大規模なデータセットと十分なコンピューティングが与えられた場合、各問題の構造と対称性に合わせてニューラルアーキテクチャを設計することは有益でしょうか?
それともデータから学習する方が効率的でしょうか?
私たちは、等変ネットワークと非等変ネットワークがコンピューティングサンプルとトレーニングサンプルに応じてどのように拡張されるかを経験的に研究します。
剛体相互作用のベンチマーク問題と汎用トランスフォーマーアーキテクチャに焦点を当て、モデルサイズ、トレーニングステップ、データセットサイズを変更して一連の実験を実行します。
私たちは 3 つの結論の証拠を見つけました。
まず、等変性によりデータ効率が向上しますが、十分なエポックがあれば、データ拡張を使用して非等変モデルをトレーニングすることでこのギャップを埋めることができます。
第 2 に、コンピューティングによるスケーリングはべき乗則に従い、テストされた各コンピューティングバジェットで等変モデルが非等変モデルよりも優れたパフォーマンスを示します。
最後に、モデルサイズとトレーニング期間に対するコンピューティングバジェットの最適な割り当ては、等変モデルと非等変モデルで異なります。

要約(オリジナル)

Given large data sets and sufficient compute, is it beneficial to design neural architectures for the structure and symmetries of each problem? Or is it more efficient to learn them from data? We study empirically how equivariant and non-equivariant networks scale with compute and training samples. Focusing on a benchmark problem of rigid-body interactions and on general-purpose transformer architectures, we perform a series of experiments, varying the model size, training steps, and dataset size. We find evidence for three conclusions. First, equivariance improves data efficiency, but training non-equivariant models with data augmentation can close this gap given sufficient epochs. Second, scaling with compute follows a power law, with equivariant models outperforming non-equivariant ones at each tested compute budget. Finally, the optimal allocation of a compute budget onto model size and training duration differs between equivariant and non-equivariant models.

arxiv情報

著者	Johann Brehmer,Sönke Behrends,Pim de Haan,Taco Cohen
発行日	2024-10-30 16:36:59+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Does equivariance matter at scale?

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー