Enhancing Transformer Training Efficiency with Dynamic Dropout

要約

Dynamic Dropout は、トレーニングエポックまたは検証損失の改善に基づいてドロップアウト率を動的に調整することで、Transformer モデルのトレーニング効率を向上させるように設計された新しい正則化手法です。
このアプローチは、高速収束と高いパフォーマンスを達成するために重要な、正則化とモデル容量のバランスを取るという課題に対処します。
私たちの方法には、可変ドロップアウト率を受け入れるように GPT モデルを変更し、線形減衰、指数関数的減衰、検証損失ベースの調整などのスケジュールを使用してトレーニング中にドロップアウト層を更新することが含まれます。
Shakespeare\_char データセットに対する広範な実験により、固定ドロップアウト率のベースラインモデルと比較して、動的ドロップアウトによってトレーニングが大幅に加速され、推論効率が向上することが実証されました。
検証損失ベースの調整スケジュールでは全体的に最高のパフォーマンスが得られ、大規模な Transformer モデルをトレーニングするための貴重な手法としてのダイナミックドロップアウトの可能性が強調されました。

要約(オリジナル)

We introduce Dynamic Dropout, a novel regularization technique designed to enhance the training efficiency of Transformer models by dynamically adjusting the dropout rate based on training epochs or validation loss improvements. This approach addresses the challenge of balancing regularization and model capacity, which is crucial for achieving fast convergence and high performance. Our method involves modifying the GPT model to accept a variable dropout rate and updating dropout layers during training using schedules such as linear decay, exponential decay, and validation loss-based adjustments. Extensive experiments on the Shakespeare\_char dataset demonstrate that Dynamic Dropout significantly accelerates training and improves inference efficiency compared to a baseline model with a fixed dropout rate. The validation loss-based adjustment schedule provided the best overall performance, highlighting the potential of Dynamic Dropout as a valuable technique for training large-scale Transformer models.

arxiv情報

著者	Hanrui Yan,Dan Shao
発行日	2024-11-05 16:36:07+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Enhancing Transformer Training Efficiency with Dynamic Dropout

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー