Kolmogorov-Arnold Attention: Is Learnable Attention Better For Vision Transformers?

要約

Kolmogorov-Arnold Networks（KANS）は、データからより複雑な関係を獲得する可能性を秘めた、学習可能な活性化関数で構成される顕著な革新です。
Kansは象徴的な表現を見つけ、1次元機能の継続的な学習に役立ちますが、ビジョンなどの多様な機械学習（ML）タスクにおけるそれらの有効性は疑わしいままです。
現在、Kansは、ビジョントランス（VIT）などの高度なアーキテクチャを含む、深いネットワークアーキテクチャに多層パーセプロン（MLP）を置き換えることにより展開されています。
この論文では、私たちは、あらゆる選択で動作できるバニラのvitsの一般的な学習可能なコルモゴロフ・アーノルドの注意（karat）を設計した最初の人です。
しかし、彼らがトレーニングするためのコンピューティングとメモリのコストは、よりモジュール式バージョンを提案するように私たちに動機付けられ、Fourier-Karatと呼ばれる特定の学習可能な注意を設計しました。
フーリエカラットとそのバリエーションは、VITカウンターパートを上回るか、CIFAR-10、CIFAR-100、およびImagenet-1Kデータセットで同等のパフォーマンスを示します。
これらのアーキテクチャのパフォーマンスと一般化能力を分析し、損失の状況、重量分布、オプティマイザーパス、注意の視覚化、およびスペクトル挙動を分析し、バニラのvitsと対比します。
このペーパーの目標は、パラメーターと計算効率の高い注意を生み出すことではなく、学習可能な活性化を慎重に理解する必要があるより高度なアーキテクチャと併せてカンを探索することをコミュニティに奨励することです。
オープンソースコードと実装の詳細は、https：//subhajitmaity.me/karatで入手できます

要約(オリジナル)

Kolmogorov-Arnold networks (KANs) are a remarkable innovation consisting of learnable activation functions with the potential to capture more complex relationships from data. Although KANs are useful in finding symbolic representations and continual learning of one-dimensional functions, their effectiveness in diverse machine learning (ML) tasks, such as vision, remains questionable. Presently, KANs are deployed by replacing multilayer perceptrons (MLPs) in deep network architectures, including advanced architectures such as vision Transformers (ViTs). In this paper, we are the first to design a general learnable Kolmogorov-Arnold Attention (KArAt) for vanilla ViTs that can operate on any choice of basis. However, the computing and memory costs of training them motivated us to propose a more modular version, and we designed particular learnable attention, called Fourier-KArAt. Fourier-KArAt and its variants either outperform their ViT counterparts or show comparable performance on CIFAR-10, CIFAR-100, and ImageNet-1K datasets. We dissect these architectures’ performance and generalization capacity by analyzing their loss landscapes, weight distributions, optimizer path, attention visualization, and spectral behavior, and contrast them with vanilla ViTs. The goal of this paper is not to produce parameter- and compute-efficient attention, but to encourage the community to explore KANs in conjunction with more advanced architectures that require a careful understanding of learnable activations. Our open-source code and implementation details are available on: https://subhajitmaity.me/KArAt

arxiv情報

著者	Subhajit Maity,Killian Hitsman,Xin Li,Aritra Dutta
発行日	2025-03-13 17:59:52+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Kolmogorov-Arnold Attention: Is Learnable Attention Better For Vision Transformers?

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー