Brain network science modelling of sparse neural networks enables Transformers and LLMs to perform as fully connected

要約

動的スパーストレーニング（DST）は、ANNの計算需要を減らすことができますが、ピーク性能を高いスパースレベルで維持するのが難しくなります。
Cannistraci-Hebbトレーニング（CHT）は、DSTで接続性を拡大するための脳に触発された方法です。
CHTは、完全に接続されたネットワークと比較して、さまざまなタスクで超スパース（1％未満の接続性）の利点を示している、グラデーションフリーのトポロジ駆動型リンク再生を活用しています。
しかし、CHTは2つの主な欠点に苦しんでいます：（i）その時間の複雑さは$ o（nd^3）$ -nノードネットワークサイズ、Dノード度 – それを超球体制に制限します。
（ii）ネットワークが信頼できない接続を提示する場合、初期のトレーニングエポックには不適切なトップリンク予測スコアを選択します。
ここでは、スパース人工ニューラルネットワークの接続性を初期化するために、最初の脳にインスパイアされたネットワークモデル（Bipartite受容フィールド（BRF）と呼ばれる）を設計します。
さらに、CHリンク予測のGPUに優しいマトリックスベースの近似を導入し、複雑さを$ O（n^3）$に減らします。
Cannistraci-Hebbトレーニングソフトルール（CHTS）を紹介します。これは、リンクの除去と再成長の両方で接続をサンプリングするための柔軟な戦略を採用し、ネットワークトポロジの探索と搾取のバランスをとります。
さらに、CHTSをシグモイド段階密度減衰（CHTSS）と統合します。
経験的結果は、BRFが以前のネットワークサイエンスモデルよりもパフォーマンスの利点を提供することを示しています。
接続の1％を使用して、CHTSは画像分類タスクのMLPアーキテクチャの完全に接続されたネットワークを上回り、一部のネットワークをノードの30％未満に圧縮します。
接続の5％を使用して、CHTSSは2つの変圧器ベースの機械翻訳タスクで完全に接続されたネットワークを上回ります。
最後に、30％の接続で、CHTSとCHTSSの両方が言語モデリングの他のDSTメソッドよりも優れており、ゼロショットタスクで完全に接続されたベースラインを超えています。

要約(オリジナル)

Dynamic sparse training (DST) can reduce the computational demands in ANNs, but faces difficulties in keeping peak performance at high sparsity levels. The Cannistraci-Hebb training (CHT) is a brain-inspired method for growing connectivity in DST. CHT leverages a gradient-free, topology-driven link regrowth, which has shown ultra-sparse (less than 1% connectivity) advantage across various tasks compared to fully connected networks. Yet, CHT suffers two main drawbacks: (i) its time complexity is $O(Nd^3)$ – N node network size, d node degree – restricting it to ultra-sparse regimes. (ii) it selects top link prediction scores, which is inappropriate for the early training epochs, when the network presents unreliable connections. Here, we design the first brain-inspired network model – termed bipartite receptive field (BRF) – to initialize the connectivity of sparse artificial neural networks. We further introduce a GPU-friendly matrix-based approximation of CH link prediction, reducing complexity to $O(N^3)$. We introduce the Cannistraci-Hebb training soft rule (CHTs), which adopts a flexible strategy for sampling connections in both link removal and regrowth, balancing the exploration and exploitation of network topology. Additionally, we integrate CHTs with a sigmoid gradual density decay (CHTss). Empirical results show that BRF offers performance advantages over previous network science models. Using 1% of connections, CHTs outperforms fully connected networks in MLP architectures on image classification tasks, compressing some networks to less than 30% of the nodes. Using 5% of the connections, CHTss outperforms fully connected networks in two Transformer-based machine translation tasks. Finally, at 30% connectivity, both CHTs and CHTss outperform other DST methods in language modeling and even exceed fully connected baselines in zero-shot tasks.

arxiv情報

著者	Yingtao Zhang,Diego Cerretti,Jialin Zhao,Wenjing Wu,Ziheng Liao,Umberto Michieli,Carlo Vittorio Cannistraci
発行日	2025-06-02 09:19:36+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Brain network science modelling of sparse neural networks enables Transformers and LLMs to perform as fully connected

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー