Sparsely-gated MoE Layers for CNN Interpretability

要約

最近、大規模なトランスフォーマーのスケーリング、特に言語モデリングタスクに、Sparsely-gated Mixture of Expert (MoE) レイヤーが正常に適用されました。
スパース MoE レイヤーの興味深い副作用は、自然な専門家の専門化によってモデルに固有の解釈可能性を伝えることです。
この作業では、スパース MoE レイヤーをコンピュータービジョンタスクの CNN に適用し、モデルの解釈可能性に対する結果の影響を分析します。
萌えトレーニングを安定させるために、ソフトとハードの両方の制約ベースのアプローチを提示します。
ハード制約では、特定のエキスパートの重みがゼロになることが許可されますが、ソフト制約では、エキスパートの貢献と追加の補助損失のバランスがとれます。
その結果、ソフト制約はエキスパートの使用をより適切に処理し、エキスパートの専門化プロセスをサポートしますが、ハード制約はより一般化されたエキスパートを維持し、全体的なモデルパフォーマンスを向上させます。
私たちの調査結果は、専門家が入力空間の個々のサブドメインに暗黙のうちに集中できることを示しています。
たとえば、CIFAR-100 画像分類のトレーニングを受けた専門家は、事前のデータクラスタリングなしで、花や動物などのさまざまなドメインの認識を専門としています。
RetinaNet と COCO データセットを使用した実験では、オブジェクト検出の専門家が異なるサイズのオブジェクトの検出を専門にできることも示されています。

要約(オリジナル)

Sparsely-gated Mixture of Expert (MoE) layers have been recently successfully applied for scaling large transformers, especially for language modeling tasks. An intriguing side effect of sparse MoE layers is that they convey inherent interpretability to a model via natural expert specialization. In this work, we apply sparse MoE layers to CNNs for computer vision tasks and analyze the resulting effect on model interpretability. To stabilize MoE training, we present both soft and hard constraint-based approaches. With hard constraints, the weights of certain experts are allowed to become zero, while soft constraints balance the contribution of experts with an additional auxiliary loss. As a result, soft constraints handle expert utilization better and support the expert specialization process, while hard constraints maintain more generalized experts and increase overall model performance. Our findings demonstrate that experts can implicitly focus on individual sub-domains of the input space. For example, experts trained for CIFAR-100 image classification specialize in recognizing different domains such as flowers or animals without previous data clustering. Experiments with RetinaNet and the COCO dataset further indicate that object detection experts can also specialize in detecting objects of distinct sizes.

arxiv情報

著者	Svetlana Pavlitskaya,Christian Hubschneider,Lukas Struppek,J. Marius Zöllner
発行日	2022-12-22 10:06:59+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Sparsely-gated MoE Layers for CNN Interpretability

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー