Does generalization performance of $l^q$ regularization learning depend on $q$? A negative example

要約

$l^q$-正則化は、機械学習と統計モデリングにおいて魅力的な手法であることが実証されています。
係数を適切に縮小することで、マシン (モデル) の一般化 (予測) 能力を向上させようとします。
$l^q$ 推定量の形状は、正則化次数 $q$ の選択によって異なります。
特に、$l^1$ は LASSO 推定につながり、$l^{2}$ はスムーズリッジ回帰に対応します。
これにより、次数 $q$ がアプリケーションの潜在的な調整パラメーターになります。
$l^{q}$-正則化の使用を容易にするために、$q$ の詳細な選択を回避できるモデリング戦略を模索するつもりです。
この精神に基づき、私たちは標本依存仮説空間 (SDHS) の下での $l^{q}$ 正則化カーネル学習の一般的な枠組みの中で調査を行います。
指定されたクラスのカーネル関数について、$0< q < \infty$ に対するすべての $l^{q}$ 推定器が同様の一般化誤差限界に達することを示します。これらの推定された境界は、対数係数まで上限と下限が漸近的に同一になるという意味で、ほぼ最適です。この発見は、一部のモデリングコンテキストでは $q$ の選択が汎化能力の点で強い影響を与えない可能性があることを暫定的に明らかにしています。この観点から、$q$ は任意に指定することも、滑らかさ、計算の複雑さ、スパース性などの一般化できない他の基準によってのみ指定することもできます。

要約(オリジナル)

$l^q$-regularization has been demonstrated to be an attractive technique in machine learning and statistical modeling. It attempts to improve the generalization (prediction) capability of a machine (model) through appropriately shrinking its coefficients. The shape of a $l^q$ estimator differs in varying choices of the regularization order $q$. In particular, $l^1$ leads to the LASSO estimate, while $l^{2}$ corresponds to the smooth ridge regression. This makes the order $q$ a potential tuning parameter in applications. To facilitate the use of $l^{q}$-regularization, we intend to seek for a modeling strategy where an elaborative selection on $q$ is avoidable. In this spirit, we place our investigation within a general framework of $l^{q}$-regularized kernel learning under a sample dependent hypothesis space (SDHS). For a designated class of kernel functions, we show that all $l^{q}$ estimators for $0< q < \infty$ attain similar generalization error bounds. These estimated bounds are almost optimal in the sense that up to a logarithmic factor, the upper and lower bounds are asymptotically identical. This finding tentatively reveals that, in some modeling contexts, the choice of $q$ might not have a strong impact in terms of the generalization capability. From this perspective, $q$ can be arbitrarily specified, or specified merely by other no generalization criteria like smoothness, computational complexity, sparsity, etc..

arxiv情報

著者	Shaobo Lin,Chen Xu,Jingshan Zeng,Jian Fang
発行日	2023-06-13 14:21:16+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Does generalization performance of $l^q$ regularization learning depend on $q$? A negative example

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー