Low-rank bias, weight decay, and model merging in neural networks


さらに、$ L2 $の正規化と低ランクバイアスで有効なマルチタスク学習現象を調査します。


We explore the low-rank structure of the weight matrices in neural networks originating from training with Gradient Descent (GD) and Gradient Flow (GF) with $L2$ regularization (also known as weight decay). We show several properties of GD-trained deep neural networks, induced by $L2$ regularization. In particular, for a stationary point of GD we show alignment of the parameters and the gradient, norm preservation across layers, and low-rank bias: properties previously known in the context of GF solutions. Experiments show that the assumptions made in the analysis only mildly affect the observations. In addition, we investigate a multitask learning phenomenon enabled by $L2$ regularization and low-rank bias. In particular, we show that if two networks are trained, such that the inputs in the training set of one network are approximately orthogonal to the inputs in the training set of the other network, the new network obtained by simply summing the weights of the two networks will perform as well on both training sets as the respective individual networks. We demonstrate this for shallow ReLU neural networks trained by GD, as well as deep linear and deep ReLU networks trained by GF.


著者 Ilja Kuzborskij,Yasin Abbasi Yadkori
発行日 2025-02-24 17:17:00+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.LG | Low-rank bias, weight decay, and model merging in neural networks はコメントを受け付けていません

Implicit Repair with Reinforcement Learning in Emergent Communication




Conversational repair is a mechanism used to detect and resolve miscommunication and misinformation problems when two or more agents interact. One particular and underexplored form of repair in emergent communication is the implicit repair mechanism, where the interlocutor purposely conveys the desired information in such a way as to prevent misinformation from any other interlocutor. This work explores how redundancy can modify the emergent communication protocol to continue conveying the necessary information to complete the underlying task, even with additional external environmental pressures such as noise. We focus on extending the signaling game, called the Lewis Game, by adding noise in the communication channel and inputs received by the agents. Our analysis shows that agents add redundancy to the transmitted messages as an outcome to prevent the negative impact of noise on the task success. Additionally, we observe that the emerging communication protocol’s generalization capabilities remain equivalent to architectures employed in simpler games that are entirely deterministic. Additionally, our method is the only one suitable for producing robust communication protocols that can handle cases with and without noise while maintaining increased generalization performance levels.


著者 Fábio Vital,Alberto Sardinha,Francisco S. Melo
発行日 2025-02-24 17:23:04+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.LG, cs.MA | Implicit Repair with Reinforcement Learning in Emergent Communication はコメントを受け付けていません

Distributional Scaling Laws for Emergent Capabilities




In this paper, we explore the nature of sudden breakthroughs in language model performance at scale, which stands in contrast to smooth improvements governed by scaling laws. While advocates of ‘emergence’ view abrupt performance gains as capabilities unlocking at specific scales, others have suggested that they are produced by thresholding effects and alleviated by continuous metrics. We propose that breakthroughs are instead driven by continuous changes in the probability distribution of training outcomes, particularly when performance is bimodally distributed across random seeds. In synthetic length generalization tasks, we show that different random seeds can produce either highly linear or emergent scaling trends. We reveal that sharp breakthroughs in metrics are produced by underlying continuous changes in their distribution across seeds. Furthermore, we provide a case study of inverse scaling and show that even as the probability of a successful run declines, the average performance of a successful run continues to increase monotonically. We validate our distributional scaling framework on realistic settings by measuring MMLU performance in LLM populations. These insights emphasize the role of random variation in the effect of scale on LLM capabilities.


著者 Rosie Zhao,Tian Qin,David Alvarez-Melis,Sham Kakade,Naomi Saphra
発行日 2025-02-24 17:34:45+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.LG, I.2.7 | Distributional Scaling Laws for Emergent Capabilities はコメントを受け付けていません

An Explainable AI Model for Binary LJ Fluids


さまざまな組成と温度を持つバイナリ混合物のRDFは、AIモデルを確立および検証するために、Molecular Dynamics(MD)シミュレーションから収集されます。
これにより、有効性が向上し、AI RDFモデルの複雑さが減少します。


Lennard-Jones (LJ) fluids serve as an important theoretical framework for understanding molecular interactions. Binary LJ fluids, where two distinct species of particles interact based on the LJ potential, exhibit rich phase behavior and provide valuable insights of complex fluid mixtures. Here we report the construction and utility of an artificial intelligence (AI) model for binary LJ fluids, focusing on their effectiveness in predicting radial distribution functions (RDFs) across a range of conditions. The RDFs of a binary mixture with varying compositions and temperatures are collected from molecular dynamics (MD) simulations to establish and validate the AI model. In this AI pipeline, RDFs are discretized in order to reduce the output dimension of the model. This, in turn, improves the efficacy, and reduce the complexity of an AI RDF model. The model is shown to predict RDFs for many unknown mixtures very accurately, especially outside the training temperature range. Our analysis suggests that the particle size ratio has a higher order impact on the microstructure of a binary mixture. We also highlight the areas where the fidelity of the AI model is low when encountering new regimes with different underlying physics.


著者 Israrul H Hashmi,Rahul Karmakar,Marripelli Maniteja,Kumar Ayush,Tarak K. Patra
発行日 2025-02-24 17:35:01+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cond-mat.mtrl-sci, cs.LG, physics.chem-ph | An Explainable AI Model for Binary LJ Fluids はコメントを受け付けていません

A Closer Look at TabPFN v2: Strength, Limitation, and Extension


最近導入された変圧器ベースの表形式の前データ装置Network V2(TABPFN V2)は、複数の表形式データセットで前例のないコンテキスト内学習精度を達成し、表形式の基礎モデルの極めて進歩を示しています。
この論文では、300を超えるデータセットでTabpfn V2を包括的に評価し、小規模から中規模のタスクに関する例外的な一般化機能を確認します。
私たちの分析では、ランダム化された特徴トークンがTABPFN V2の成功の重要な要素として特定されています。これらは、不均一なデータセットを固定次元表現に統一し、より効果的なトレーニングと推論を可能にします。
TABPFN V2の予測をさらに理解するために、休暇1対折りためアプローチを提案し、TABPFN V2を特徴抽出器に変換し、データ分布を簡素化して精度を高める機能を明らかにします。
最後に、高次元、大規模、および多くのカテゴリタスクのTABPFN V2の制限に対処するために、考え方のプロンプトに触発され、スケーラブルな推論を可能にする分割統治メカニズムを導入します。
TABPFN V2の成功の背後にあるメカニズムを明らかにし、適用可能性を拡大するための戦略を導入することにより、この研究は表形式の基礎モデルの未来に関する重要な洞察を提供します。


Tabular datasets are inherently heterogeneous, posing significant challenges for developing pre-trained foundation models. The recently introduced transformer-based Tabular Prior-data Fitted Network v2 (TabPFN v2) achieves unprecedented in-context learning accuracy across multiple tabular datasets, marking a pivotal advancement in tabular foundation models. In this paper, we comprehensively evaluate TabPFN v2 on over 300 datasets, confirming its exceptional generalization capabilities on small- to medium-scale tasks. Our analysis identifies randomized feature tokens as a key factor behind TabPFN v2’s success, as they unify heterogeneous datasets into a fixed-dimensional representation, enabling more effective training and inference. To further understand TabPFN v2’s predictions, we propose a leave-one-fold-out approach, transforming TabPFN v2 into a feature extractor and revealing its capability to simplify data distributions and boost accuracy. Lastly, to address TabPFN v2’s limitations in high-dimensional, large-scale, and many-category tasks, we introduce a divide-and-conquer mechanism inspired by Chain-of-Thought prompting, enabling scalable inference. By uncovering the mechanisms behind TabPFN v2’s success and introducing strategies to expand its applicability, this study provides key insights into the future of tabular foundation models.


著者 Han-Jia Ye,Si-Yang Liu,Wei-Lun Chao
発行日 2025-02-24 17:38:42+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.LG | A Closer Look at TabPFN v2: Strength, Limitation, and Extension はコメントを受け付けていません

A Refined Analysis of UCBVI


この作業では、UCBVIアルゴリズムの洗練された分析(Azar et al。、2017)を提供し、ボーナス条件と後悔分析の両方を改善します。


In this work, we provide a refined analysis of the UCBVI algorithm (Azar et al., 2017), improving both the bonus terms and the regret analysis. Additionally, we compare our version of UCBVI with both its original version and the state-of-the-art MVP algorithm. Our empirical validation demonstrates that improving the multiplicative constants in the bounds has significant positive effects on the empirical performance of the algorithms.


著者 Simone Drago,Marco Mussi,Alberto Maria Metelli
発行日 2025-02-24 17:50:29+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.LG, stat.ML | A Refined Analysis of UCBVI はコメントを受け付けていません

Sustainable Greenhouse Management: A Comparative Analysis of Recurrent and Graph Neural Networks


STGNNSは現在、パフォーマンスが低いことを示していますが(冬のr^2 = 0.947)、それらのアーキテクチャは、PV生成や作物の成長指標などの追加の変数を統合する可能性が高くなります。


The integration of photovoltaic (PV) systems into greenhouses not only optimizes land use but also enhances sustainable agricultural practices by enabling dual benefits of food production and renewable energy generation. However, accurate prediction of internal environmental conditions is crucial to ensure optimal crop growth while maximizing energy production. This study introduces a novel application of Spatio-Temporal Graph Neural Networks (STGNNs) to greenhouse microclimate modeling, comparing their performance with traditional Recurrent Neural Networks (RNNs). While RNNs excel at temporal pattern recognition, they cannot explicitly model the directional relationships between environmental variables. Our STGNN approach addresses this limitation by representing these relationships as directed graphs, enabling the model to capture both spatial dependencies and their directionality. Using high-frequency data collected at 15-minute intervals from a greenhouse in Volos, Greece, we demonstrate that RNNs achieve exceptional accuracy in winter conditions (R^2 = 0.985) but show limitations during summer cooling system operation. Though STGNNs currently show lower performance (winter R^2 = 0.947), their architecture offers greater potential for integrating additional variables such as PV generation and crop growth indicators.


著者 Emiliano Seri,Marcello Petitta,Cristina Cornaro
発行日 2025-02-24 17:52:01+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.LG, stat.AP | Sustainable Greenhouse Management: A Comparative Analysis of Recurrent and Graph Neural Networks はコメントを受け付けていません

A Concise Lyapunov Analysis of Nesterov’s Accelerated Gradient Method




Convergence analysis of Nesterov’s accelerated gradient method has attracted significant attention over the past decades. While extensive work has explored its theoretical properties and elucidated the intuition behind its acceleration, a simple and direct proof of its convergence rates is still lacking. We provide a concise Lyapunov analysis of the convergence rates of Nesterov’s accelerated gradient method for both general convex and strongly convex functions.


著者 Jun Liu
発行日 2025-02-24 17:55:35+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.LG, cs.SY, eess.SY, math.OC | A Concise Lyapunov Analysis of Nesterov’s Accelerated Gradient Method はコメントを受け付けていません

Continuous Integration Practices in Machine Learning Projects: The Practitioners` Perspective


これらの定量的調査結果に基づいて、この研究は47 mLのプロジェクトから155人の実務家を調査し、定性的な観点からこれらの特徴の根本的な理由を調査します。


Continuous Integration (CI) is a cornerstone of modern software development. However, while widely adopted in traditional software projects, applying CI practices to Machine Learning (ML) projects presents distinctive characteristics. For example, our previous work revealed that ML projects often experience longer build durations and lower test coverage rates compared to their non-ML counterparts. Building on these quantitative findings, this study surveys 155 practitioners from 47 ML projects to investigate the underlying reasons for these distinctive characteristics through a qualitative perspective. Practitioners highlighted eight key differences, including test complexity, infrastructure requirements, and build duration and stability. Common challenges mentioned by practitioners include higher project complexity, model training demands, extensive data handling, increased computational resource needs, and dependency management, all contributing to extended build durations. Furthermore, ML systems’ non-deterministic nature, data dependencies, and computational constraints were identified as significant barriers to effective testing. The key takeaway from this study is that while foundational CI principles remain valuable, ML projects require tailored approaches to address their unique challenges. To bridge this gap, we propose a set of ML-specific CI practices, including tracking model performance metrics and prioritizing test execution within CI pipelines. Additionally, our findings highlight the importance of fostering interdisciplinary collaboration to strengthen the testing culture in ML projects. By bridging quantitative findings with practitioners’ insights, this study provides a deeper understanding of the interplay between CI practices and the unique demands of ML projects, laying the groundwork for more efficient and robust CI strategies in this domain.


著者 João Helis Bernardo,Daniel Alencar da Costa,Filipe Roseiro Cogo,Sérgio Queiróz de Medeiros,Uirá Kulesza
発行日 2025-02-24 18:01:50+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.LG, cs.SE | Continuous Integration Practices in Machine Learning Projects: The Practitioners` Perspective はコメントを受け付けていません

Unlocking the Power of LSTM for Long Term Time Series Forecasting




Traditional recurrent neural network architectures, such as long short-term memory neural networks (LSTM), have historically held a prominent role in time series forecasting (TSF) tasks. While the recently introduced sLSTM for Natural Language Processing (NLP) introduces exponential gating and memory mixing that are beneficial for long term sequential learning, its potential short memory issue is a barrier to applying sLSTM directly in TSF. To address this, we propose a simple yet efficient algorithm named P-sLSTM, which is built upon sLSTM by incorporating patching and channel independence. These modifications substantially enhance sLSTM’s performance in TSF, achieving state-of-the-art results. Furthermore, we provide theoretical justifications for our design, and conduct extensive comparative and analytical experiments to fully validate the efficiency and superior performance of our model.


著者 Yaxuan Kong,Zepu Wang,Yuqi Nie,Tian Zhou,Stefan Zohren,Yuxuan Liang,Peng Sun,Qingsong Wen
発行日 2025-02-24 18:01:55+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.LG | Unlocking the Power of LSTM for Long Term Time Series Forecasting はコメントを受け付けていません