EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling




Latent generative models have emerged as a leading approach for high-quality image synthesis. These models rely on an autoencoder to compress images into a latent space, followed by a generative model to learn the latent distribution. We identify that existing autoencoders lack equivariance to semantic-preserving transformations like scaling and rotation, resulting in complex latent spaces that hinder generative performance. To address this, we propose EQ-VAE, a simple regularization approach that enforces equivariance in the latent space, reducing its complexity without degrading reconstruction quality. By finetuning pre-trained autoencoders with EQ-VAE, we enhance the performance of several state-of-the-art generative models, including DiT, SiT, REPA and MaskGIT, achieving a 7 speedup on DiT-XL/2 with only five epochs of SD-VAE fine-tuning. EQ-VAE is compatible with both continuous and discrete autoencoders, thus offering a versatile enhancement for a wide range of latent generative models. Project page and code: https://eq-vae.github.io/.


著者 Theodoros Kouzelis,Ioannis Kakogeorgiou,Spyros Gidaris,Nikos Komodakis
発行日 2025-02-13 17:21:51+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.LG | EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling はコメントを受け付けていません

Robust Learning of Multi-index Models via Iterative Subspace Approximation


A $ k $ -MIMは、$ k $ -dimensionalサブスペースにのみ依存する関数$ f $です。
この手順は、$ f(\ mathbf {x})$が$ \ mathbf {x} $の投影の関数に近いように、サブスペース$ v $を効率的に見つけます。
アプリケーションとして、次の概念クラスに対してより高速な堅牢な学習者を提供します。 * {\ bfマルチクラス線形分類器}一定の因子に、サンプルの複雑さ$ n = o(d)2^{\ mathrm {poly}(
k/\ epsilon)} $および計算の複雑さ$ \ mathrm {poly}(n、d)$。
これは、このクラスの最初の一定の因子不可知論者学習者であり、その複雑さは$ d $の固定級多項式です。
* {\ bfハーフスペースの交差点}このクラスのおおよその不可知論学習者に0-1エラー$ k \ tilde {o}(\ mathrm {opt}) + \ epsilon $ with supplementity $ n = o(d^
2)2^{\ mathrm {poly}(k/\ epsilon)} $および計算の複雑さ$ \ mathrm {poly}(n、d)$。
これは、このクラスの最初の不可知論者の学習者であり、ほぼ線形の誤差依存性と複雑さが$ d $の固定級多項式です。
さらに、ランダム分類ノイズが存在する場合、アルゴリズムの複雑さは、$ 1/\ epsilon $で多項式的にスケーリングすることを示しています。


We study the task of learning Multi-Index Models (MIMs) with label noise under the Gaussian distribution. A $K$-MIM is any function $f$ that only depends on a $K$-dimensional subspace. We focus on well-behaved MIMs with finite ranges that satisfy certain regularity properties. Our main contribution is a general robust learner that is qualitatively optimal in the Statistical Query (SQ) model. Our algorithm iteratively constructs better approximations to the defining subspace by computing low-degree moments conditional on the projection to the subspace computed thus far, and adding directions with relatively large empirical moments. This procedure efficiently finds a subspace $V$ so that $f(\mathbf{x})$ is close to a function of the projection of $\mathbf{x}$ onto $V$. Conversely, for functions for which these conditional moments do not help, we prove an SQ lower bound suggesting that no efficient learner exists. As applications, we provide faster robust learners for the following concept classes: * {\bf Multiclass Linear Classifiers} We give a constant-factor approximate agnostic learner with sample complexity $N = O(d) 2^{\mathrm{poly}(K/\epsilon)}$ and computational complexity $\mathrm{poly}(N ,d)$. This is the first constant-factor agnostic learner for this class whose complexity is a fixed-degree polynomial in $d$. * {\bf Intersections of Halfspaces} We give an approximate agnostic learner for this class achieving 0-1 error $K \tilde{O}(\mathrm{OPT}) + \epsilon$ with sample complexity $N=O(d^2) 2^{\mathrm{poly}(K/\epsilon)}$ and computational complexity $\mathrm{poly}(N ,d)$. This is the first agnostic learner for this class with near-linear error dependence and complexity a fixed-degree polynomial in $d$. Furthermore, we show that in the presence of random classification noise, the complexity of our algorithm scales polynomially with $1/\epsilon$.


著者 Ilias Diakonikolas,Giannis Iakovidis,Daniel M. Kane,Nikos Zarifis
発行日 2025-02-13 17:37:42+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.DS, cs.LG, math.ST, stat.ML, stat.TH | Robust Learning of Multi-index Models via Iterative Subspace Approximation はコメントを受け付けていません

Fast Tensor Completion via Approximate Richardson Iteration




We study tensor completion (TC) through the lens of low-rank tensor decomposition (TD). Many TD algorithms use fast alternating minimization methods, which solve highly structured linear regression problems at each step (e.g., for CP, Tucker, and tensor-train decompositions). However, such algebraic structure is lost in TC regression problems, making direct extensions unclear. To address this, we propose a lifting approach that approximately solves TC regression problems using structured TD regression algorithms as blackbox subroutines, enabling sublinear-time methods. We theoretically analyze the convergence rate of our approximate Richardson iteration based algorithm, and we demonstrate on real-world tensors that its running time can be 100x faster than direct methods for CP completion.


著者 Mehrdad Ghadiri,Matthew Fahrbach,Yunbum Kook,Ali Jadbabaie
発行日 2025-02-13 17:50:27+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.DS, cs.LG, math.ST, stat.TH | Fast Tensor Completion via Approximate Richardson Iteration はコメントを受け付けていません

SyntheticPop: Attacking Speaker Verification Systems With Synthetic VoicePops




Voice Authentication (VA), also known as Automatic Speaker Verification (ASV), is a widely adopted authentication method, particularly in automated systems like banking services, where it serves as a secondary layer of user authentication. Despite its popularity, VA systems are vulnerable to various attacks, including replay, impersonation, and the emerging threat of deepfake audio that mimics the voice of legitimate users. To mitigate these risks, several defense mechanisms have been proposed. One such solution, Voice Pops, aims to distinguish an individual’s unique phoneme pronunciations during the enrollment process. While promising, the effectiveness of VA+VoicePop against a broader range of attacks, particularly logical or adversarial attacks, remains insufficiently explored. We propose a novel attack method, which we refer to as SyntheticPop, designed to target the phoneme recognition capabilities of the VA+VoicePop system. The SyntheticPop attack involves embedding synthetic ‘pop’ noises into spoofed audio samples, significantly degrading the model’s performance. We achieve an attack success rate of over 95% while poisoning 20% of the training dataset. Our experiments demonstrate that VA+VoicePop achieves 69% accuracy under normal conditions, 37% accuracy when subjected to a baseline label flipping attack, and just 14% accuracy under our proposed SyntheticPop attack, emphasizing the effectiveness of our method.


著者 Eshaq Jamdar,Amith Kamath Belman
発行日 2025-02-13 18:05:12+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CR, cs.LG | SyntheticPop: Attacking Speaker Verification Systems With Synthetic VoicePops はコメントを受け付けていません

Asymptotic Normality of Generalized Low-Rank Matrix Sensing via Riemannian Geometry


一般化された低ランクマトリックスセンシングの漸近正常保証 – すなわち、一般的な凸損失$ \ bar \ ell(\ langle x、m \ rangle、y^*)$の下でのマトリックスセンシング、$ m \ in \ in \
Mathbb {r}^{d \ times d} $は未知のランクです-$ $ k $ matrix、$ x $は測定マトリックス、$ y^*$は対応する測定です。
特に、低ランクマトリックスのマニホールドを$ \ bar \ theta \ bar \ theta^\ top $によってパラメーター化します。
次に、経験的損失の最小化$ \ bar \ theta^0 \ in \ mathbb {r}^{d \ times k} $は、真のパラメーター$ \ bar \ theta^*$、
$ \ sqrt {n}(\ phi^0- \ phi^*)\ xRightArrow {d} n(0、(h^*)^{-1})$ as $ n \ to \ infty $を証明します
$ \ phi^0 $および$ \ phi^*$は、$ \ bar \ theta^*$および$ \ bar \ theta^0 $の表現です。
d \ times k}/\ text {o}(k)$、および$ h^*$は、同じ表現における真の損失のヘシアンです。


We prove an asymptotic normality guarantee for generalized low-rank matrix sensing — i.e., matrix sensing under a general convex loss $\bar\ell(\langle X,M\rangle,y^*)$, where $M\in\mathbb{R}^{d\times d}$ is the unknown rank-$k$ matrix, $X$ is a measurement matrix, and $y^*$ is the corresponding measurement. Our analysis relies on tools from Riemannian geometry to handle degeneracy of the Hessian of the loss due to rotational symmetry in the parameter space. In particular, we parameterize the manifold of low-rank matrices by $\bar\theta\bar\theta^\top$, where $\bar\theta\in\mathbb{R}^{d\times k}$. Then, assuming the minimizer of the empirical loss $\bar\theta^0\in\mathbb{R}^{d\times k}$ is in a constant size ball around the true parameters $\bar\theta^*$, we prove $\sqrt{n}(\phi^0-\phi^*)\xrightarrow{D}N(0,(H^*)^{-1})$ as $n\to\infty$, where $\phi^0$ and $\phi^*$ are representations of $\bar\theta^*$ and $\bar\theta^0$ in the horizontal space of the Riemannian quotient manifold $\mathbb{R}^{d\times k}/\text{O}(k)$, and $H^*$ is the Hessian of the true loss in the same representation.


著者 Osbert Bastani
発行日 2025-02-13 18:22:34+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.LG, stat.ML | Asymptotic Normality of Generalized Low-Rank Matrix Sensing via Riemannian Geometry はコメントを受け付けていません

Enhancing the Utility of Higher-Order Information in Relational Learning




Higher-order information is crucial for relational learning in many domains where relationships extend beyond pairwise interactions. Hypergraphs provide a natural framework for modeling such relationships, which has motivated recent extensions of graph neural net- work architectures to hypergraphs. However, comparisons between hypergraph architectures and standard graph-level models remain limited. In this work, we systematically evaluate a selection of hypergraph-level and graph-level architectures, to determine their effectiveness in leveraging higher-order information in relational learning. Our results show that graph-level architectures applied to hypergraph expansions often outperform hypergraph- level ones, even on inputs that are naturally parametrized as hypergraphs. As an alternative approach for leveraging higher-order information, we propose hypergraph-level encodings based on classical hypergraph characteristics. While these encodings do not significantly improve hypergraph architectures, they yield substantial performance gains when combined with graph-level models. Our theoretical analysis shows that hypergraph-level encodings provably increase the representational power of message-passing graph neural networks beyond that of their graph-level counterparts.


著者 Raphael Pellegrin,Lukas Fesser,Melanie Weber
発行日 2025-02-13 18:28:17+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.LG, stat.ML | Enhancing the Utility of Higher-Order Information in Relational Learning はコメントを受け付けていません

DiffMS: Diffusion Generation of Molecules Conditioned on Mass Spectra


構造解明タスクの1つの定式化は、質量スペクトルが与えられた分子構造の条件付き$ \ textit {de novo} $生成です。
確立されたベンチマークに関する広範な実験は、diffMが$ \ textit {de novo} $分子生成の既存のモデルを上回ることを示しています。


Mass spectrometry plays a fundamental role in elucidating the structures of unknown molecules and subsequent scientific discoveries. One formulation of the structure elucidation task is the conditional $\textit{de novo}$ generation of molecular structure given a mass spectrum. Toward a more accurate and efficient scientific discovery pipeline for small molecules, we present DiffMS, a formula-restricted encoder-decoder generative network that achieves state-of-the-art performance on this task. The encoder utilizes a transformer architecture and models mass spectra domain knowledge such as peak formulae and neutral losses, and the decoder is a discrete graph diffusion model restricted by the heavy-atom composition of a known chemical formula. To develop a robust decoder that bridges latent embeddings and molecular structures, we pretrain the diffusion decoder with fingerprint-structure pairs, which are available in virtually infinite quantities, compared to structure-spectrum pairs that number in the tens of thousands. Extensive experiments on established benchmarks show that DiffMS outperforms existing models on $\textit{de novo}$ molecule generation. We provide several ablations to demonstrate the effectiveness of our diffusion and pretraining approaches and show consistent performance scaling with increasing pretraining dataset size. DiffMS code is publicly available at https://github.com/coleygroup/DiffMS.


著者 Montgomery Bohde,Mrunali Manjrekar,Runzhong Wang,Shuiwang Ji,Connor W. Coley
発行日 2025-02-13 18:29:48+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.LG, q-bio.QM | DiffMS: Diffusion Generation of Molecules Conditioned on Mass Spectra はコメントを受け付けていません

Learning to Coordinate with Experts


専門家のエージェントからの支援を活用する – 人間であろうとAI-canであろうと、このような状況での安全性とパフォーマンスが大幅に向上します。
このホワイトペーパーでは、Learning To rieck and Requirect Control(YRC)と呼ばれる基本的な調整問題を紹介します。ここでは、目的は、いつ自律的に行​​動するか、いつ専門家の支援を求めるかを決定する戦略を学ぶことです。


When deployed in dynamic environments, AI agents will inevitably encounter challenges that exceed their individual capabilities. Leveraging assistance from expert agents-whether human or AI-can significantly enhance safety and performance in such situations. However, querying experts is often costly, necessitating the development of agents that can efficiently request and utilize expert guidance. In this paper, we introduce a fundamental coordination problem called Learning to Yield and Request Control (YRC), where the objective is to learn a strategy that determines when to act autonomously and when to seek expert assistance. We consider a challenging practical setting in which an agent does not interact with experts during training but must adapt to novel environmental changes and expert interventions at test time. To facilitate empirical research, we introduce YRC-Bench, an open-source benchmark featuring diverse domains. YRC-Bench provides a standardized Gym-like API, simulated experts, evaluation pipeline, and implementation of competitive baselines. Towards tackling the YRC problem, we propose a novel validation approach and investigate the performance of various learning methods across diverse environments, yielding insights that can guide future research.


著者 Mohamad H. Danesh,Tu Trinh,Benjamin Plaut,Nguyen X. Khanh
発行日 2025-02-13 18:41:55+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.LG, stat.ML | Learning to Coordinate with Experts はコメントを受け付けていません

Rolling Ahead Diffusion for Traffic Scene Simulation




Realistic driving simulation requires that NPCs not only mimic natural driving behaviors but also react to the behavior of other simulated agents. Recent developments in diffusion-based scenario generation focus on creating diverse and realistic traffic scenarios by jointly modelling the motion of all the agents in the scene. However, these traffic scenarios do not react when the motion of agents deviates from their modelled trajectories. For example, the ego-agent can be controlled by a stand along motion planner. To produce reactive scenarios with joint scenario models, the model must regenerate the scenario at each timestep based on new observations in a Model Predictive Control (MPC) fashion. Although reactive, this method is time-consuming, as one complete possible future for all NPCs is generated per simulation step. Alternatively, one can utilize an autoregressive model (AR) to predict only the immediate next-step future for all NPCs. Although faster, this method lacks the capability for advanced planning. We present a rolling diffusion based traffic scene generation model which mixes the benefits of both methods by predicting the next step future and simultaneously predicting partially noised further future steps at the same time. We show that such model is efficient compared to diffusion model based AR, achieving a beneficial compromise between reactivity and computational efficiency.


著者 Yunpeng Liu,Matthew Niedoba,William Harvey,Adam Scibior,Berend Zwartsenberg,Frank Wood
発行日 2025-02-13 18:45:56+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.LG, cs.RO | Rolling Ahead Diffusion for Traffic Scene Simulation はコメントを受け付けていません

Censor Dependent Variational Inference


より実際には、CDVIのスケーラブルな実装のために設計されたV-Structure Variation Autoencoder(VAE)であるCD-CVAEを紹介します。


This paper provides a comprehensive analysis of variational inference in latent variable models for survival analysis, emphasizing the distinctive challenges associated with applying variational methods to survival data. We identify a critical weakness in the existing methodology, demonstrating how a poorly designed variational distribution may hinder the objective of survival analysis tasks–modeling time-to-event distributions. We prove that the optimal variational distribution, which perfectly bounds the log-likelihood, may depend on the censoring mechanism. To address this issue, we propose censor-dependent variational inference (CDVI), tailored for latent variable models in survival analysis. More practically, we introduce CD-CVAE, a V-structure Variational Autoencoder (VAE) designed for the scalable implementation of CDVI. Further discussion extends some existing theories and training techniques to survival analysis. Extensive experiments validate our analysis and demonstrate significant improvements in the estimation of individual survival distributions.


著者 Chuanhui Liu,Xiao Wang
発行日 2025-02-13 18:48:04+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.LG, stat.ML | Censor Dependent Variational Inference はコメントを受け付けていません