jarxiv | Japanese arxiv

Assessing the Limits of In-Context Learning beyond Functions using Partially Ordered Relation

投稿日: 2025年6月17日作成者: jarxiv

要約

多くの場合、デモンストレーションの例を伴うタスクに対する合理的で一般的に正確な応答を生成することは、モデルのパラメーター空間を更新することなく、大規模な言語モデル（LLM）の顕著なコンテキスト学習（ICL）機能を強調しています。
ドキュメントレベルの概念からの推論に焦点を当てている継続的な調査を持っているにもかかわらず、コンテキストで明確に定義された機能または関係を学習する際の行動には慎重な調査が必要です。
この記事では、プロンプトの誘導的に増加する複雑さの概念を導入することにより、部分的に順序付けられた関係でICLのパフォーマンスを紹介します。
ほとんどの場合、選択したメトリックの飽和性能は、ICLがいくらかの利点を提供する一方で、十分な実証例が存在する場合でもプロンプトの複雑さを高めるにつれてその有効性が制約されたままであることを示しています。
行動は私たちの経験的発見から明らかであり、その暗黙的な最適化プロセスの観点でさらに理論的に正当化されています。
コードは\ href {https://anonymous.4open.science/r/iclonpartiallyorderset} {ここで}可能です。

要約(オリジナル)

Generating rational and generally accurate responses to tasks, often accompanied by example demonstrations, highlights Large Language Model’s (LLM’s) remarkable In-Context Learning (ICL) capabilities without requiring updates to the model’s parameter space. Despite having an ongoing exploration focused on the inference from a document-level concept, its behavior in learning well-defined functions or relations in context needs a careful investigation. In this article, we present the performance of ICL on partially ordered relation by introducing the notion of inductively increasing complexity in prompts. In most cases, the saturated performance of the chosen metric indicates that while ICL offers some benefits, its effectiveness remains constrained as we increase the complexity in the prompts even in presence of sufficient demonstrative examples. The behavior is evident from our empirical findings and has further been theoretically justified in term of its implicit optimization process. The code is available \href{https://anonymous.4open.science/r/ICLonPartiallyOrderSet}{here}.

arxiv情報

著者	Debanjan Dutta,Faizanuddin Ansari,Swagatam Das
発行日	2025-06-16 15:35:41+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.LG | コメントを受け付けていません

Variational Inference with Mixtures of Isotropic Gaussians

投稿日: 2025年6月17日作成者: jarxiv

要約

変分推論（VI）は、ベイジアン推論の一般的なアプローチであり、パラメトリックファミリ内の後部分布の最良の近似を探すため、通常（逆）カルバック繰り返し（kl）の発散である損失を最小限に抑えます。
この論文では、次のパラメトリックファミリーに焦点を当てています。等方性ガウスの混合物（つまり、アイデンティティに比例した対角線共分散マトリックスを使用）と均一な重みです。
私たちはバリエーションフレームワークを開発し、このファミリに適した効率的なアルゴリズムを提供します。
一般的な共分散マトリックスとガウスの混合物とは対照的に、この選択は、メモリと計算上効率である一方で、マルチモーダルベイジアン後期の正確な近似のバランスを示します。
私たちのアルゴリズムは、混合成分の位置（ガウスのモード）の位置に勾配降下を実装し、（エントロピー的な）ミラーまたはビュアの分散パラメーターのいずれかです。
数値実験に関するアルゴリズムのパフォーマンスを示します。

要約(オリジナル)

Variational inference (VI) is a popular approach in Bayesian inference, that looks for the best approximation of the posterior distribution within a parametric family, minimizing a loss that is typically the (reverse) Kullback-Leibler (KL) divergence. In this paper, we focus on the following parametric family: mixtures of isotropic Gaussians (i.e., with diagonal covariance matrices proportional to the identity) and uniform weights. We develop a variational framework and provide efficient algorithms suited for this family. In contrast with mixtures of Gaussian with generic covariance matrices, this choice presents a balance between accurate approximations of multimodal Bayesian posteriors, while being memory and computationally efficient. Our algorithms implement gradient descent on the location of the mixture components (the modes of the Gaussians), and either (an entropic) Mirror or Bures descent on their variance parameters. We illustrate the performance of our algorithms on numerical experiments.

arxiv情報

著者	Marguerite Petit-Talamon,Marc Lambert,Anna Korba
発行日	2025-06-16 15:42:15+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.LG, stat.ML | コメントを受け付けていません

Optimistic Q-learning for average reward and episodic reinforcement learning

投稿日: 2025年6月17日作成者: jarxiv

要約

すべてのポリシーで、頻繁な状態$ s_0 $を訪問する時間は予想または一定の確率で$ h $で上限に縛られているという、基礎となるMDPの追加の仮定の下で、平均報酬補強学習における後悔の最小化のための楽観的なQ学習アルゴリズムを提示します。
私たちの設定は、エピソード設定を厳密に一般化し、平均的な報酬設定のモデルフリーアルゴリズムに関するほとんどの以前の文献で作成された境界ヒット時間\ textit {すべての状態}の仮定よりもはるかに制限が少ないです。
$ \ tilde {o}（h^5 s \ sqrt {at}）$の後悔の境界を示します。ここで、$ s $ and $ a $は状態と行動の数であり、$ t $は地平線です。
私たちの作品の主要な技術的な斬新さは、$ \ overline {l} $演算子の導入です。
与えられた仮定の下で、$ \ overline {l} $演算子は、割引率が1ドルである平均報酬設定であっても、厳格な収縮（スパン）を持っていることを示します。
当社のアルゴリズム設計では、エピソードQラーニングのアイデアを使用して、このオペレーターを推定および適用します。
したがって、私たちは、エピソードおよび非エピソードの設定における後悔の最小化の統一された見解を提供します。

要約(オリジナル)

We present an optimistic Q-learning algorithm for regret minimization in average reward reinforcement learning under an additional assumption on the underlying MDP that for all policies, the time to visit some frequent state $s_0$ is finite and upper bounded by $H$, either in expectation or with constant probability. Our setting strictly generalizes the episodic setting and is significantly less restrictive than the assumption of bounded hitting time \textit{for all states} made by most previous literature on model-free algorithms in average reward settings. We demonstrate a regret bound of $\tilde{O}(H^5 S\sqrt{AT})$, where $S$ and $A$ are the numbers of states and actions, and $T$ is the horizon. A key technical novelty of our work is the introduction of an $\overline{L}$ operator defined as $\overline{L} v = \frac{1}{H} \sum_{h=1}^H L^h v$ where $L$ denotes the Bellman operator. Under the given assumption, we show that the $\overline{L}$ operator has a strict contraction (in span) even in the average-reward setting where the discount factor is $1$. Our algorithm design uses ideas from episodic Q-learning to estimate and apply this operator iteratively. Thus, we provide a unified view of regret minimization in episodic and non-episodic settings, which may be of independent interest.

arxiv情報

著者	Priyank Agrawal,Shipra Agrawal
発行日	2025-06-16 15:51:30+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.LG, stat.ML | コメントを受け付けていません

Global Convergence of Adjoint-Optimized Neural PDEs

投稿日: 2025年6月17日作成者: jarxiv

要約

最近、多くのエンジニアリングおよび科学的分野は、ニューラルネットワークを使用した部分微分方程式（PDE）のモデリング用語に関心を持っています。
ニューラルネットワークパラメーターの関数である結果として得られるニューラルネットワークPDEモデルは、勾配降下を使用してPDEを最適化することにより利用可能なデータにキャリブレーションできます。ここでは、勾配が隣接PDEを解くことにより計算効率的な方法で評価されます。
これらのニューラルネットワークPDEモデルは、科学機械学習の重要な研究分野として浮上しています。
この論文では、隠されたユニットとトレーニング時間の両方が無限になる傾向がある制限で、ニューラルネットワークPDEモデルをトレーニングするための補助勾配降下最適化方法の収束を研究します。
具体的には、ソース用語に埋め込まれたニューラルネットワークを備えた非線形放物線PDEの一般的なクラスの場合、標的データ（すなわち、グローバルミニマイザー）への訓練されたニューラルネットワークPDEソリューションの収束を証明します。
The global convergence proof poses a unique mathematical challenge that is not encountered in finite-dimensional neural network convergence analyses due to (1) the neural network training dynamics involving a non-local neural network kernel operator in the infinite-width hidden layer limit where the kernel lacks a spectral gap for its eigenvalues and (2) the nonlinearity of the limit PDE system, which leads to a
無限の最適化の問題は、無限の幅の隠れ層の制限であっても（最適化の問題が大きなニューロン制限で凸になる典型的なニューラルネットワークトレーニングの場合とは異なり）。
理論的な結果は、数値研究によって示され、経験的に検証されています。

要約(オリジナル)

Many engineering and scientific fields have recently become interested in modeling terms in partial differential equations (PDEs) with neural networks. The resulting neural-network PDE model, being a function of the neural network parameters, can be calibrated to available data by optimizing over the PDE using gradient descent, where the gradient is evaluated in a computationally efficient manner by solving an adjoint PDE. These neural-network PDE models have emerged as an important research area in scientific machine learning. In this paper, we study the convergence of the adjoint gradient descent optimization method for training neural-network PDE models in the limit where both the number of hidden units and the training time tend to infinity. Specifically, for a general class of nonlinear parabolic PDEs with a neural network embedded in the source term, we prove convergence of the trained neural-network PDE solution to the target data (i.e., a global minimizer). The global convergence proof poses a unique mathematical challenge that is not encountered in finite-dimensional neural network convergence analyses due to (1) the neural network training dynamics involving a non-local neural network kernel operator in the infinite-width hidden layer limit where the kernel lacks a spectral gap for its eigenvalues and (2) the nonlinearity of the limit PDE system, which leads to a non-convex optimization problem, even in the infinite-width hidden layer limit (unlike in typical neual network training cases where the optimization problem becomes convex in the large neuron limit). The theoretical results are illustrated and empirically validated by numerical studies.

arxiv情報

著者	Konstantin Riedl,Justin Sirignano,Konstantinos Spiliopoulos
発行日	2025-06-16 16:00:00+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: 35K55, 35Q93, 49M41, 68T07, 90C26, cs.LG, cs.NA, math.AP, math.NA, math.OC | コメントを受け付けていません

EUNIS Habitat Maps: Enhancing Thematic and Spatial Resolution for Europe through Machine Learning

投稿日: 2025年6月17日作成者: jarxiv

要約

Eunisの生息地の分類は、ヨーロッパの生息地を分類し、自然保護に関するヨーロッパの政策を支援し、自然修復法を実施するために重要です。
詳細かつ正確な生息地情報の需要の高まりを満たすために、独立した検証と不確実性分析とともに、階層レベル3で260のユーニスの生息地タイプの空間予測を提供します。
アンサンブルの機械学習モデルを使用して、高解像度の衛星画像と生態学的に意味のある気候、地形、およびedaphic変数を使用して、ヨーロッパで最も可能性の高いユーニスの生息地を示すヨーロッパの生息地マップを作成しました。
さらに、各ユーニスレベル1形成内のレベル3で予測の不確実性と最も可能性の高い生息地に関する情報を提供します。
この製品は、保全と回復の両方の目的に特に役立ちます。
予測は、空間ブロックの交差検証を使用してヨーロッパの規模で交差検証され、フランス（森林のみ）、オランダ、オーストリアからの独立したデータに対して評価されました。
生息地マップは、生息地の形成全体のリコールと精度の観点から、明確なトレードオフを伴う検証データセットで強力な予測パフォーマンスを取得しました。

要約(オリジナル)

The EUNIS habitat classification is crucial for categorising European habitats, supporting European policy on nature conservation and implementing the Nature Restoration Law. To meet the growing demand for detailed and accurate habitat information, we provide spatial predictions for 260 EUNIS habitat types at hierarchical level 3, together with independent validation and uncertainty analyses. Using ensemble machine learning models, together with high-resolution satellite imagery and ecologically meaningful climatic, topographic and edaphic variables, we produced a European habitat map indicating the most probable EUNIS habitat at 100-m resolution across Europe. Additionally, we provide information on prediction uncertainty and the most probable habitats at level 3 within each EUNIS level 1 formation. This product is particularly useful for both conservation and restoration purposes. Predictions were cross-validated at European scale using a spatial block cross-validation and evaluated against independent data from France (forests only), the Netherlands and Austria. The habitat maps obtained strong predictive performances on the validation datasets with distinct trade-offs in terms of recall and precision across habitat formations.

arxiv情報

著者	Sara Si-Moussi,Stephan Hennekens,Sander Mücher,Wanda De Keersmaecker,Milan Chytrý,Emiliano Agrillo,Fabio Attorre,Idoia Biurrun,Gianmaria Bonari,Andraž Čarni,Renata Ćušterevska,Tetiana Dziuba,Klaus Ecker,Behlül Güler,Ute Jandt,Borja Jiménez-Alfaro,Jonathan Lenoir,Jens-Christian Svenning,Grzegorz Swacha,Wilfried Thuiller
発行日	2025-06-16 16:10:08+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: 62M30, 62P12, 92D40, cs.LG, I.2.6, physics.geo-ph, q-bio.QM, stat.AP | コメントを受け付けていません

xbench: Tracking Agents Productivity Scaling with Profession-Aligned Real-World Evaluations

投稿日: 2025年6月17日作成者: jarxiv

要約

AIエージェント機能と実世界の生産性とのギャップを埋めるために設計された、動的で職業に並ぶ評価スイートであるXbenchを紹介します。
既存のベンチマークは、多くの場合、孤立した技術スキルに焦点を当てていますが、専門的な環境で提供される経済的価値を正確に反映していない場合があります。
これに対処するために、Xbenchは、業界の専門家によって定義された評価タスクを備えた商業的に重要なドメインをターゲットにしています。
私たちのフレームワークは、生産性の価値と強く相関するメトリックを作成し、テクノロジー市場適合（TMF）の予測を可能にし、時間の経過とともに製品機能の追跡を促進します。
最初の実装として、採用とマーケティングの2つのベンチマークを提示します。
採用のために、実際のヘッドハンティングビジネスシナリオから50のタスクを収集して、会社のマッピング、情報検索、および人材調達におけるエージェントの能力を評価します。
マーケティングのために、インフルエンサーを広告主のニーズと一致させるエージェントの能力を評価し、836人の候補者インフルエンサーのキュレーションされたプールを使用して、50の広告主の要件にわたってパフォーマンスを評価します。
現代の主要なエージェントの初期評価結果を提示し、これらの専門的なドメインのベースラインを確立します。
継続的に更新されたエバルセットと評価は、https：//xbench.orgで入手できます。

要約(オリジナル)

We introduce xbench, a dynamic, profession-aligned evaluation suite designed to bridge the gap between AI agent capabilities and real-world productivity. While existing benchmarks often focus on isolated technical skills, they may not accurately reflect the economic value agents deliver in professional settings. To address this, xbench targets commercially significant domains with evaluation tasks defined by industry professionals. Our framework creates metrics that strongly correlate with productivity value, enables prediction of Technology-Market Fit (TMF), and facilitates tracking of product capabilities over time. As our initial implementations, we present two benchmarks: Recruitment and Marketing. For Recruitment, we collect 50 tasks from real-world headhunting business scenarios to evaluate agents’ abilities in company mapping, information retrieval, and talent sourcing. For Marketing, we assess agents’ ability to match influencers with advertiser needs, evaluating their performance across 50 advertiser requirements using a curated pool of 836 candidate influencers. We present initial evaluation results for leading contemporary agents, establishing a baseline for these professional domains. Our continuously updated evalsets and evaluations are available at https://xbench.org.

arxiv情報

著者	Kaiyuan Chen,Yixin Ren,Yang Liu,Xiaobo Hu,Haotong Tian,Tianbao Xie,Fangfu Liu,Haoye Zhang,Hongzhang Liu,Yuan Gong,Chen Sun,Han Hou,Hui Yang,James Pan,Jianan Lou,Jiayi Mao,Jizheng Liu,Jinpeng Li,Kangyi Liu,Kenkun Liu,Rui Wang,Run Li,Tong Niu,Wenlong Zhang,Wenqi Yan,Xuanzheng Wang,Yuchen Zhang,Yi-Hsin Hung,Yuan Jiang,Zexuan Liu,Zihan Yin,Zijian Ma,Zhiwen Mo
発行日	2025-06-16 16:16:14+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.LG | コメントを受け付けていません

PeakWeather: MeteoSwiss Weather Station Measurements for Spatiotemporal Deep Learning

投稿日: 2025年6月17日作成者: jarxiv

要約

正確な気象予測は、幅広い活動と意思決定プロセスをサポートし、悪天候の影響を軽減するために不可欠です。
従来の数値予測（NWP）は依然として運用予測の基礎となっていますが、機械学習は、高速で柔軟な、およびスケーラブルな予測の強力な代替手段として浮上しています。
Peakweatherを紹介します。Peakweatherは、連邦気象および気候学メテオスウィスの測定ネットワークの地上局から8年以上にわたって10分ごとに収集された地表気象観測の高品質のデータセットです。
データセットには、スイスの複雑な地形全体に配布された302のステーションの場所からの多様な気象変数のセットが含まれており、コンテキストのデジタル高モデルから派生した地形指標で補完されます。
現在運用上の高解像度NWPモデルからのアンサンブル予測は、新しいアプローチを評価するためのベースライン予測として提供されます。
データセットの豊かさは、さまざまなスケールでの時系列予測、グラフ構造学習、代入、仮想センシングなど、幅広い空間的タスクをサポートしています。
そのため、Peakweatherは、基本的な機械学習研究、気象学、およびセンサーベースのアプリケーションの両方を前進させるための現実世界のベンチマークとして機能します。

要約(オリジナル)

Accurate weather forecasts are essential for supporting a wide range of activities and decision-making processes, as well as mitigating the impacts of adverse weather events. While traditional numerical weather prediction (NWP) remains the cornerstone of operational forecasting, machine learning is emerging as a powerful alternative for fast, flexible, and scalable predictions. We introduce PeakWeather, a high-quality dataset of surface weather observations collected every 10 minutes over more than 8 years from the ground stations of the Federal Office of Meteorology and Climatology MeteoSwiss’s measurement network. The dataset includes a diverse set of meteorological variables from 302 station locations distributed across Switzerland’s complex topography and is complemented with topographical indices derived from digital height models for context. Ensemble forecasts from the currently operational high-resolution NWP model are provided as a baseline forecast against which to evaluate new approaches. The dataset’s richness supports a broad spectrum of spatiotemporal tasks, including time series forecasting at various scales, graph structure learning, imputation, and virtual sensing. As such, PeakWeather serves as a real-world benchmark to advance both foundational machine learning research, meteorology, and sensor-based applications.

arxiv情報

著者	Daniele Zambon,Michele Cattaneo,Ivan Marisca,Jonas Bhend,Daniele Nerini,Cesare Alippi
発行日	2025-06-16 16:16:42+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.LG, stat.ML | コメントを受け付けていません

Adversarial Disentanglement by Backpropagation with Physics-Informed Variational Autoencoder

投稿日: 2025年6月17日作成者: jarxiv

要約

特に複数の交絡ソースが測定された応答に影響する場合、物理システムの部分的な知識の下での推論と予測は困難です。
物理学ベースのモデルにおけるこれらの影響を明示的に説明することは、認識論的な不確実性、コスト、または時間の制約のためにしばしば実行不可能であり、システムの動作を正確に説明できないモデルをもたらします。
一方、変分自動エンコーダーなどのデータ駆動型機械学習モデルは、普通の表現を特定することは保証されていません。
その結果、彼らは、限られた騒々しいデータの体制における一般化のパフォーマンスと再建の精度の低下に苦しむ可能性があります。
物理学ベースのモデルの解釈可能性とデータ駆動型モデルの柔軟性を組み合わせた物理情報に基づいた変動自動エンコーダーアーキテクチャを提案します。
既知の物理学と交絡の影響の解体を促進するために、潜在空間は、物理ベースのモデルをパラメータ化する物理的に意味のある変数と、物理システムのドメインとクラスの変動性をキャプチャするデータ駆動型変数に分割されます。
エンコーダーは、物理ベースとデータ駆動型のコンポーネントを統合するデコーダーと結合され、データ駆動型のコンポーネントが既知の物理学を無効にするのを防ぎ、物理学に基づいた潜在変数が解釈可能なままであることを保証する敵対的なトレーニング目標によって制約されます。
モデルが入力信号の特徴を解くことができ、既知の物理学をクラスおよびドメインの観測可能性の形で監督を使用して交絡の影響から分離できることを実証します。
このモデルは、工学構造に関連する一連の合成ケーススタディで評価され、提案されたアプローチの実現可能性を実証します。

要約(オリジナル)

Inference and prediction under partial knowledge of a physical system is challenging, particularly when multiple confounding sources influence the measured response. Explicitly accounting for these influences in physics-based models is often infeasible due to epistemic uncertainty, cost, or time constraints, resulting in models that fail to accurately describe the behavior of the system. On the other hand, data-driven machine learning models such as variational autoencoders are not guaranteed to identify a parsimonious representation. As a result, they can suffer from poor generalization performance and reconstruction accuracy in the regime of limited and noisy data. We propose a physics-informed variational autoencoder architecture that combines the interpretability of physics-based models with the flexibility of data-driven models. To promote disentanglement of the known physics and confounding influences, the latent space is partitioned into physically meaningful variables that parametrize a physics-based model, and data-driven variables that capture variability in the domain and class of the physical system. The encoder is coupled with a decoder that integrates physics-based and data-driven components, and constrained by an adversarial training objective that prevents the data-driven components from overriding the known physics, ensuring that the physics-grounded latent variables remain interpretable. We demonstrate that the model is able to disentangle features of the input signal and separate the known physics from confounding influences using supervision in the form of class and domain observables. The model is evaluated on a series of synthetic case studies relevant to engineering structures, demonstrating the feasibility of the proposed approach.

arxiv情報

著者	Ioannis Christoforos Koune,Alice Cicirello
発行日	2025-06-16 16:18:25+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.LG, stat.ML | コメントを受け付けていません

Data-Driven Soil Organic Carbon Sampling: Integrating Spectral Clustering with Conditioned Latin Hypercube Optimization

投稿日: 2025年6月17日作成者: jarxiv

要約

土壌有機炭素（SOC）モニタリングは、環境共変量に基づいて代表的なフィールドサンプリング場所の選択に依存することがよくあります。
SOCサンプリングの代表性を高めるために、条件付けされたラテンハイパーキューブサンプリング（CLHS）を使用した監視されていない機械学習技術であるスペクトルクラスタリングを統合する新しいハイブリッド方法論を提案します。
私たちのアプローチでは、スペクトルクラスタリングは、多変量共変量データを使用して調査エリアを$ K $均質ゾーンに分割し、その後、各ゾーン内にCLHSを適用して、環境条件の完全な多様性を集合的にキャプチャするサンプリング場所を選択します。
このハイブリッドスペクトルCLHSメソッドにより、マイナーだが重要な環境クラスターでさえサンプリングされ、そのような領域を見落とすことができるバニラCLHの重要な制限に対処することが保証されます。
実際のSOCマッピングデータセットで、Spectral-CLHSが標準のCLHよりも共変量特徴空間と空間的不均一性のより均一なカバーを提供することを示します。
この改善されたサンプリング設計は、機械学習モデルのよりバランスの取れたトレーニングデータを提供することにより、より正確なSOC予測をもたらす可能性があります。

要約(オリジナル)

Soil organic carbon (SOC) monitoring often relies on selecting representative field sampling locations based on environmental covariates. We propose a novel hybrid methodology that integrates spectral clustering – an unsupervised machine learning technique with conditioned Latin hypercube sampling (cLHS) to enhance the representativeness of SOC sampling. In our approach, spectral clustering partitions the study area into $K$ homogeneous zones using multivariate covariate data, and cLHS is then applied within each zone to select sampling locations that collectively capture the full diversity of environmental conditions. This hybrid spectral-cLHS method ensures that even minor but important environmental clusters are sampled, addressing a key limitation of vanilla cLHS which can overlook such areas. We demonstrate on a real SOC mapping dataset that spectral-cLHS provides more uniform coverage of covariate feature space and spatial heterogeneity than standard cLHS. This improved sampling design has the potential to yield more accurate SOC predictions by providing better-balanced training data for machine learning models.

arxiv情報

著者	Weiying Zhao,Aleksei Unagaev,Natalia Efremova
発行日	2025-06-16 16:20:35+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.LG | コメントを受け付けていません

The Courage to Stop: Overcoming Sunk Cost Fallacy in Deep Reinforcement Learning

投稿日: 2025年6月17日作成者: jarxiv

要約

オフポリシーディープ補強学習（RL）は通常、学習中に過去の経験を再利用するためにリプレイバッファーを活用します。
これは、収集されたデータが有益であり、学習目標と整合する場合、サンプルの効率を改善するのに役立ちます。
そうでない場合、それは、無駄なサンプリングによる環境相互作用を無駄にすることに加えて、最適化の課題を悪化させる可能性のあるデータでリプレイバッファーを「汚染」する効果をもたらす可能性があります。
私たちは、これらの無知で無駄な移行をサンプリングすることは、深いRLの文脈では、終了までエピソードを継続する傾向である沈んだコストの誤りに対処することで回避できると主張します。
これに対処するために、Q値と勾配統計に基づいて戦略的な早期エピソード終了を可能にする軽量メカニズムである（少なくとも）学習を提案します。
私たちの方法は、MujocoとDeepmind Control Suite Benchmarksの両方で評価されたさまざまなRLアルゴリズムの学習効率を改善することを実証します。

要約(オリジナル)

Off-policy deep reinforcement learning (RL) typically leverages replay buffers for reusing past experiences during learning. This can help improve sample efficiency when the collected data is informative and aligned with the learning objectives; when that is not the case, it can have the effect of ‘polluting’ the replay buffer with data which can exacerbate optimization challenges in addition to wasting environment interactions due to wasteful sampling. We argue that sampling these uninformative and wasteful transitions can be avoided by addressing the sunk cost fallacy, which, in the context of deep RL, is the tendency towards continuing an episode until termination. To address this, we propose learn to stop (LEAST), a lightweight mechanism that enables strategic early episode termination based on Q-value and gradient statistics, which helps agents recognize when to terminate unproductive episodes early. We demonstrate that our method improves learning efficiency on a variety of RL algorithms, evaluated on both the MuJoCo and DeepMind Control Suite benchmarks.

arxiv情報

著者	Jiashun Liu,Johan Obando-Ceron,Pablo Samuel Castro,Aaron Courville,Ling Pan
発行日	2025-06-16 16:30:00+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.LG | コメントを受け付けていません

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント