Sparsely-gated Mixture-of-Expert Layers for CNN Interpretability

要約

タイトル：CNN解釈性のための疎結合混合専門層

要約：

– 疎結合混合専門（MoE）層は、大規模トランスフォーマーに特に言語モデリングタスクにスケーリングするために最近成功裏に適用されている。
– 疎MoE層の興味深い副作用は、専門家の特化によりモデルに天然の解釈性を伝達することである。
– この研究では、CNNに疎MoE層を適用し、モデルの解釈性に与える影響を分析する。
– MoEトレーニングを安定化するために、ソフトとハード制約ベースのアプローチを提供する。
– ハード制約では、一部の専門家の重みがゼロになる。
– ソフト制約は、専門家の貢献を補助損失でバランスさせ、専門家の専門化プロセスをサポートするため、専門家の利用をうまく処理する。
– ハード制約は、より一般化された専門家を維持し、全体的なモデル性能を向上させる。
– 実験の結果、専門家は入力空間の個々のサブドメインに暗黙的に焦点を合わせることができることがわかった。
– CIFAR-100イメージ分類のために訓練された専門家は、事前のデータクラスタリングなしで花や動物などの異なるドメインを認識するように特化している。
– RetinaNetとCOCOデータセットでの実験は、オブジェクト検出専門家が異なるサイズのオブジェクトを検出することに特化することも示している。

要約(オリジナル)

Sparsely-gated Mixture of Expert (MoE) layers have been recently successfully applied for scaling large transformers, especially for language modeling tasks. An intriguing side effect of sparse MoE layers is that they convey inherent interpretability to a model via natural expert specialization. In this work, we apply sparse MoE layers to CNNs for computer vision tasks and analyze the resulting effect on model interpretability. To stabilize MoE training, we present both soft and hard constraint-based approaches. With hard constraints, the weights of certain experts are allowed to become zero, while soft constraints balance the contribution of experts with an additional auxiliary loss. As a result, soft constraints handle expert utilization better and support the expert specialization process, while hard constraints maintain more generalized experts and increase overall model performance. Our findings demonstrate that experts can implicitly focus on individual sub-domains of the input space. For example, experts trained for CIFAR-100 image classification specialize in recognizing different domains such as flowers or animals without previous data clustering. Experiments with RetinaNet and the COCO dataset further indicate that object detection experts can also specialize in detecting objects of distinct sizes.

arxiv情報

著者	Svetlana Pavlitska,Christian Hubschneider,Lukas Struppek,J. Marius Zöllner
発行日	2023-04-27 07:02:25+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

Sparsely-gated Mixture-of-Expert Layers for CNN Interpretability

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー