jarxiv | Japanese arxiv

Embodied Web Agents: Bridging Physical-Digital Realms for Integrated Agent Intelligence

投稿日: 2025年6月19日作成者: jarxiv

要約

今日のAIエージェントはほとんどが沈黙しています – 彼らはオンラインで取得した膨大な量のデジタル情報と知識を取得し、推論しています。
または、具体化された認識、計画、行動を通じて物理的な世界と対話しますが、両方ともめったにありません。
この分離は、オンラインレシピからの調理、動的マップデータでのナビゲート、Web知識を使用した実際のランドマークの解釈など、統合された物理的およびデジタルインテリジェンスを必要とするタスクを解決する能力を制限します。
具体化されたWebエージェントを紹介します。これは、AIエージェント向けの新しいパラダイムであり、具体化とWebスケールの推論を流動的に橋渡しします。
この概念を操作するために、まず、具体化されたWebエージェントタスク環境を開発します。これは、現実的な3D屋内および屋外環境を機能的なWebインターフェイスと密接に統合する統合されたシミュレーションプラットフォームです。
このプラットフォームに基づいて、調理、ナビゲーション、ショッピング、観光、地理配分などの多様なタスクを含む、具体化されたWebエージェントベンチマークを構築およびリリースします。
実験結果は、最先端のAIシステムと人間の能力の間の重要なパフォーマンスギャップを明らかにし、具体化された認知とWebスケールの知識アクセスの交差点での課題と機会の両方を確立します。
すべてのデータセット、コード、ウェブサイトは、プロジェクトページhttps://embodied-web-agent.github.io/で公開されています。

要約(オリジナル)

AI agents today are mostly siloed – they either retrieve and reason over vast amount of digital information and knowledge obtained online; or interact with the physical world through embodied perception, planning and action – but rarely both. This separation limits their ability to solve tasks that require integrated physical and digital intelligence, such as cooking from online recipes, navigating with dynamic map data, or interpreting real-world landmarks using web knowledge. We introduce Embodied Web Agents, a novel paradigm for AI agents that fluidly bridge embodiment and web-scale reasoning. To operationalize this concept, we first develop the Embodied Web Agents task environments, a unified simulation platform that tightly integrates realistic 3D indoor and outdoor environments with functional web interfaces. Building upon this platform, we construct and release the Embodied Web Agents Benchmark, which encompasses a diverse suite of tasks including cooking, navigation, shopping, tourism, and geolocation – all requiring coordinated reasoning across physical and digital realms for systematic assessment of cross-domain intelligence. Experimental results reveal significant performance gaps between state-of-the-art AI systems and human capabilities, establishing both challenges and opportunities at the intersection of embodied cognition and web-scale knowledge access. All datasets, codes and websites are publicly available at our project page https://embodied-web-agent.github.io/.

arxiv情報

著者	Yining Hong,Rui Sun,Bingxuan Li,Xingcheng Yao,Maxine Wu,Alexander Chien,Da Yin,Ying Nian Wu,Zhecan James Wang,Kai-Wei Chang
発行日	2025-06-18 17:58:17+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL, cs.CV, cs.MM, cs.RO | コメントを受け付けていません

Particle-Grid Neural Dynamics for Learning Deformable Object Models from RGB-D Videos

投稿日: 2025年6月19日作成者: jarxiv

要約

変形可能なオブジェクトのダイナミクスのモデリングは、それらの多様な物理的特性と、限られた視覚情報から状態を推定することの難しさのために困難です。
これらの課題は、ハイブリッド表現のオブジェクト粒子と空間グリッドを組み合わせたニューラルダイナミクスフレームワークで対処します。
私たちの粒子グリッドモデルは、密な粒子の動きを予測しながら、グローバルな形状とモーション情報をキャプチャし、さまざまな形状と材料でオブジェクトのモデリングを可能にします。
粒子はオブジェクトの形を表し、空間グリッドは3D空間を離散化して空間の連続性を確保し、学習効率を高めます。
視覚的なレンダリングのためのガウスのスプラットと相まって、私たちのフレームワークは、変形可能なオブジェクトの完全に学習ベースのデジタルツインを実現し、3Dアクション条件付けされたビデオを生成します。
実験を通じて、私たちのモデルは、ロープ、布、ぬいぐるみ、紙袋などの多様なオブジェクトのダイナミクスを学習し、ロボットとオブジェクトの相互作用のまばらなRGB-D録音から、カテゴリレベルで目に見えないインスタンスに一般化することを実証します。
私たちのアプローチは、特にカメラビューが限られているシナリオで、最先端の学習ベースおよび物理ベースのシミュレーターよりも優れています。
さらに、モデルベースの計画で学習されたモデルの有用性を紹介し、さまざまなタスク全体で目標調整されたオブジェクト操作を可能にします。
プロジェクトページは、https：//kywind.github.io/pgndで入手できます。

要約(オリジナル)

Modeling the dynamics of deformable objects is challenging due to their diverse physical properties and the difficulty of estimating states from limited visual information. We address these challenges with a neural dynamics framework that combines object particles and spatial grids in a hybrid representation. Our particle-grid model captures global shape and motion information while predicting dense particle movements, enabling the modeling of objects with varied shapes and materials. Particles represent object shapes, while the spatial grid discretizes the 3D space to ensure spatial continuity and enhance learning efficiency. Coupled with Gaussian Splattings for visual rendering, our framework achieves a fully learning-based digital twin of deformable objects and generates 3D action-conditioned videos. Through experiments, we demonstrate that our model learns the dynamics of diverse objects — such as ropes, cloths, stuffed animals, and paper bags — from sparse-view RGB-D recordings of robot-object interactions, while also generalizing at the category level to unseen instances. Our approach outperforms state-of-the-art learning-based and physics-based simulators, particularly in scenarios with limited camera views. Furthermore, we showcase the utility of our learned models in model-based planning, enabling goal-conditioned object manipulation across a range of tasks. The project page is available at https://kywind.github.io/pgnd .

arxiv情報

著者	Kaifeng Zhang,Baoyu Li,Kris Hauser,Yunzhu Li
発行日	2025-06-18 17:59:38+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV, cs.LG, cs.RO | コメントを受け付けていません

Evolutionary Caching to Accelerate Your Off-the-Shelf Diffusion Model

投稿日: 2025年6月19日作成者: jarxiv

要約

拡散ベースの画像生成モデルは、高品質の合成含有量の生成に優れていますが、ゆっくりと計算上の高価な推論に悩まされています。
以前の作業は、推論ステップ全体に拡散トランス内の機能をキャッシュして再利用することにより、これを軽減しようとしました。
ただし、これらの方法は、多くの場合、制限された加速またはアーキテクチャ全体の一般化が不十分な剛性ヒューリスティックに依存しています。
拡散モデル（ECAD）を加速するための進化的キャッシングを提案します。これは、効率的でモデルごとのキャッシュスケジュールを学習し、パレートフロンティアを形成し、小さなキャリブレーションプロンプトのみを使用してパレートフロンティアを形成します。
ECADは、ネットワークパラメーターまたは参照画像を変更する必要はありません。
重要な推論のスピードアップを提供し、品質遅延のトレードオフをきめ細かく制御することができ、異なる拡散モデルにシームレスに適応します。
特に、ECADの学習スケジュールは、キャリブレーション中に見られない解像度やモデルバリアントに効果的に一般化できます。
多様なベンチマーク全体で複数のメトリック（FID、クリップ、画像報酬）を使用して、Pixart-Alpha、Pixart-Sigma、およびFlux-1.evのECADを評価し、以前のアプローチよりも一貫した改善を示します。
Pixart-Alphaでは、ECADは、前の最先端の方法を4.47 Coco FIDよりも優れているスケジュールを特定し、推論の速度を2.35倍から2.58倍に増やします。
我々の結果は、拡散推論を加速するためのスケーラブルで一般化可能なアプローチとしてECADを確立します。
当社のプロジェクトWebサイトはhttps://aniaggarwal.github.io/ecadで入手でき、当社のコードはhttps://github.com/aniaggarwal/ecadで入手できます。

要約(オリジナル)

Diffusion-based image generation models excel at producing high-quality synthetic content, but suffer from slow and computationally expensive inference. Prior work has attempted to mitigate this by caching and reusing features within diffusion transformers across inference steps. These methods, however, often rely on rigid heuristics that result in limited acceleration or poor generalization across architectures. We propose Evolutionary Caching to Accelerate Diffusion models (ECAD), a genetic algorithm that learns efficient, per-model, caching schedules forming a Pareto frontier, using only a small set of calibration prompts. ECAD requires no modifications to network parameters or reference images. It offers significant inference speedups, enables fine-grained control over the quality-latency trade-off, and adapts seamlessly to different diffusion models. Notably, ECAD’s learned schedules can generalize effectively to resolutions and model variants not seen during calibration. We evaluate ECAD on PixArt-alpha, PixArt-Sigma, and FLUX-1.dev using multiple metrics (FID, CLIP, Image Reward) across diverse benchmarks (COCO, MJHQ-30k, PartiPrompts), demonstrating consistent improvements over previous approaches. On PixArt-alpha, ECAD identifies a schedule that outperforms the previous state-of-the-art method by 4.47 COCO FID while increasing inference speedup from 2.35x to 2.58x. Our results establish ECAD as a scalable and generalizable approach for accelerating diffusion inference. Our project website is available at https://aniaggarwal.github.io/ecad and our code is available at https://github.com/aniaggarwal/ecad.

arxiv情報

著者	Anirud Aggarwal,Abhinav Shrivastava,Matthew Gwilliam
発行日	2025-06-18 17:59:50+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV | コメントを受け付けていません

Nabla-R2D3: Effective and Efficient 3D Diffusion Alignment with 2D Rewards

投稿日: 2025年6月19日作成者: jarxiv

要約

3Dビジョンとコンピューターグラフィックスでは、高品質で光リアリスティックな3Dアセットを生成することは、依然として長年の課題です。
拡散モデルなどの最先端の生成モデルは、3D世代で大きな進歩を遂げていますが、指示に従う、人間の好みに合わせたり、現実的なテクスチャ、ジオメトリ、物理的属性を生成する能力が限られているため、人間が設計したコンテンツに依存していることがよくあります。
このホワイトペーパーでは、2Dリワードを使用した3Dネイティブ拡散モデルの非常に効果的でサンプル効率の高い補強学習アライメントフレームワークであるNABLA-R2D3を紹介します。
スコア関数に一致する最近提案されたNABLA-GFLOWNETメソッドに基づいて構築されています。これは、2D報酬信号のみを使用して3D拡散モデルの効果的な適応を可能にするために、プリンシップの勾配に勾配を報酬を与えます。
広範な実験によると、報酬のハッキングに苦労するか苦しむのに苦労しているバニラの微調整ベースラインとは異なり、NABLA-R2D3は一貫してより高い報酬を達成し、いくつかの微調整ステップ内で事前の忘却を減らしました。

要約(オリジナル)

Generating high-quality and photorealistic 3D assets remains a longstanding challenge in 3D vision and computer graphics. Although state-of-the-art generative models, such as diffusion models, have made significant progress in 3D generation, they often fall short of human-designed content due to limited ability to follow instructions, align with human preferences, or produce realistic textures, geometries, and physical attributes. In this paper, we introduce Nabla-R2D3, a highly effective and sample-efficient reinforcement learning alignment framework for 3D-native diffusion models using 2D rewards. Built upon the recently proposed Nabla-GFlowNet method, which matches the score function to reward gradients in a principled manner for reward finetuning, our Nabla-R2D3 enables effective adaptation of 3D diffusion models using only 2D reward signals. Extensive experiments show that, unlike vanilla finetuning baselines which either struggle to converge or suffer from reward hacking, Nabla-R2D3 consistently achieves higher rewards and reduced prior forgetting within a few finetuning steps.

arxiv情報

著者	Qingming Liu,Zhen Liu,Dinghuai Zhang,Kui Jia
発行日	2025-06-18 17:59:59+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV, cs.GR, cs.LG | コメントを受け付けていません

Uniform Mean Estimation for Heavy-Tailed Distributions via Median-of-Means

投稿日: 2025年6月19日作成者: jarxiv

要約

平均の中央値（MOM）は、尾のあるデータのコンテキストで人気を獲得した平均推定器です。
この作業では、データ分布が$ p \ in（1,2] $の最初の$ p $モーメントのみを所有している場合、クラス$ \ mathcal {f} $の各関数の平均を同時に推定するタスクでのパフォーマンスを分析します。
一般的な損失による入力と線形回帰、既存の作業の改善。

要約(オリジナル)

The Median of Means (MoM) is a mean estimator that has gained popularity in the context of heavy-tailed data. In this work, we analyze its performance in the task of simultaneously estimating the mean of each function in a class $\mathcal{F}$ when the data distribution possesses only the first $p$ moments for $p \in (1,2]$. We prove a new sample complexity bound using a novel symmetrization technique that may be of independent interest. Additionally, we present applications of our result to $k$-means clustering with unbounded inputs and linear regression with general losses, improving upon existing works.

arxiv情報

著者	Mikael Møller Høgsgaard,Andrea Paudice
発行日	2025-06-18 06:49:11+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.LG, stat.ML | コメントを受け付けていません

No-Regret Learning Under Adversarial Resource Constraints: A Spending Plan Is All You Need!

投稿日: 2025年6月19日作成者: jarxiv

要約

私たちは、リソースの制約の下でオンラインの意思決定の問題を研究しています。ここでは、報酬とコスト機能の両方が、時間の経過とともに敵対的に変化する可能性のある分布から引き出されます。
2つの標準設定に焦点を当てます。$（i）$オンラインリソース割り当ては、アクション選択の前に報酬とコストが観察される場合、およびアクション選択後に観察されるリソースの制約を備えた$（ii）$オンライン学習、完全なフィードバックまたはバンディットフィードバックです。
報酬とコスト分布が時間の経過とともにarbitrarily意的に変化する可能性がある場合、これらの設定でサブリンの後悔を達成することは不可能であることはよく知られています。
この課題に対処するために、学習者が支出計画に導かれるフレームワークを分析します。これは、ラウンド全体で予想されるリソースの使用を規定するシーケンスです。
支出計画に続くベースラインに関してサブリンの後悔を達成する一般的な（プライマル）デュアル方法を設計します。
重要なことに、支出計画がラウンド全体で予算のバランスの取れた分布を保証すると、アルゴリズムのパフォーマンスが向上します。
さらに、支出計画が非常に不均衡な最悪のシナリオを処理するための方法の堅牢なバリアントを提供します。
結論として、規定の支出計画から逸脱するベンチマークと競合する際に、アルゴリズムの後悔を研究します。

要約(オリジナル)

We study online decision making problems under resource constraints, where both reward and cost functions are drawn from distributions that may change adversarially over time. We focus on two canonical settings: $(i)$ online resource allocation where rewards and costs are observed before action selection, and $(ii)$ online learning with resource constraints where they are observed after action selection, under full feedback or bandit feedback. It is well known that achieving sublinear regret in these settings is impossible when reward and cost distributions may change arbitrarily over time. To address this challenge, we analyze a framework in which the learner is guided by a spending plan–a sequence prescribing expected resource usage across rounds. We design general (primal-)dual methods that achieve sublinear regret with respect to baselines that follow the spending plan. Crucially, the performance of our algorithms improves when the spending plan ensures a well-balanced distribution of the budget across rounds. We additionally provide a robust variant of our methods to handle worst-case scenarios where the spending plan is highly imbalanced. To conclude, we study the regret of our algorithms when competing against benchmarks that deviate from the prescribed spending plan.

arxiv情報

著者	Francesco Emanuele Stradi,Matteo Castiglioni,Alberto Marchesi,Nicola Gatti,Christian Kroer
発行日	2025-06-18 14:04:08+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.LG, stat.ML | コメントを受け付けていません

Probabilistic Aggregation and Targeted Embedding Optimization for Collective Moral Reasoning in Large Language Models

投稿日: 2025年6月19日作成者: jarxiv

要約

大規模な言語モデル（LLM）は、印象的な道徳的推論能力を示しています。
しかし、彼らはしばしば複雑で多要因の道徳的ジレンマに直面したときに分岐します。
これらの矛盾に対処するために、複数のLLMの道徳的判断を総合的に定式化された道徳的判断に統合するフレームワークを提案し、このコンセンサスから大きく逸脱するモデルを再編成します。
私たちの集約メカニズムは、継続的な道徳的許容性スコア（バイナリラベルを超えて）を集合的な確率に融合し、モデルの信頼性による重みの寄与を融合します。
誤ったモデルの場合、ターゲットを絞った埋め込み最適化手順道徳的哲学理論のための微調整トークン埋め込み、意味の完全性を維持しながらJSの相違を最小限に抑えます。
大規模な社会的道徳的ジレンマデータセットの実験は、私たちのアプローチが堅牢なコンセンサスを構築し、個々のモデルの忠実度を向上させることを示しています。
これらの調査結果は、複数のモデルにわたるデータ駆動型の道徳的整合の価値と、より安全でより一貫したAIシステムの可能性を強調しています。

要約(オリジナル)

Large Language Models (LLMs) have shown impressive moral reasoning abilities. Yet they often diverge when confronted with complex, multi-factor moral dilemmas. To address these discrepancies, we propose a framework that synthesizes multiple LLMs’ moral judgments into a collectively formulated moral judgment, realigning models that deviate significantly from this consensus. Our aggregation mechanism fuses continuous moral acceptability scores (beyond binary labels) into a collective probability, weighting contributions by model reliability. For misaligned models, a targeted embedding-optimization procedure fine-tunes token embeddings for moral philosophical theories, minimizing JS divergence to the consensus while preserving semantic integrity. Experiments on a large-scale social moral dilemma dataset show our approach builds robust consensus and improves individual model fidelity. These findings highlight the value of data-driven moral alignment across multiple models and its potential for safer, more consistent AI systems.

arxiv情報

著者	Chenchen Yuan,Zheyu Zhang,Shuo Yang,Bardh Prenkaj,Gjergji Kasneci
発行日	2025-06-18 13:21:13+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL | コメントを受け付けていません

AIn’t Nothing But a Survey? Using Large Language Models for Coding German Open-Ended Survey Responses on Survey Motivation

投稿日: 2025年6月19日作成者: jarxiv

要約

LLMの最近の開発と幅広いアクセシビリティは、自由回答形式の調査回答の分類を含む、調査研究でそれらをどのように使用できるかについての議論に拍車をかけています。
言語能力により、LLMSは、時間のかかる手動コーディングと、監視された機械学習モデルの事前トレーニングの効率的な代替手段である可能性があります。
このトピックに関するほとんどの既存の研究は、非複雑なトピックまたは単一のLLMに関連する英語の回答に焦点を合わせているため、その調査結果が一般化され、これらの分類の品質が確立された方法と比較されるかどうかは不明です。
この研究では、調査参加の理由に関するドイツのデータを使用して、他のコンテキストでのオープンエンドの調査回答をコーディングするために、異なるLLMを使用できる程度まで調査します。
いくつかの最先端のLLMといくつかのプロンプトアプローチを比較し、人間の専門家コーディングを使用してLLMSのパフォーマンスを評価します。
全体的なパフォーマンスはLLM間で大きく異なり、微調整されたLLMのみが満足のいくレベルの予測パフォーマンスを達成します。
プロンプトアプローチのパフォーマンスの違いは、使用されるLLMを条件としています。
最後に、調査参加の理由のさまざまなカテゴリにわたるLLMSの不平等な分類パフォーマンスは、微調整を使用していない場合、異なるカテゴリー分布をもたらします。
これらの調査結果の意味について、自由回答形式の応答のコーディングに関する方法論的研究と実質的な分析の両方について、およびそのようなデータの処理または実質的に分析する実務家の両方について説明します。
最後に、LLMSの年齢における自動化された応答分類のために自動化された方法を選択する際に研究者が考慮する必要がある多くのトレードオフを強調しています。
そうすることで、私たちの研究は、LLMSが調査研究で効率的かつ正確に、そして確実に活用される可能性のある条件に関する成長する研究機関に貢献しています。

要約(オリジナル)

The recent development and wider accessibility of LLMs have spurred discussions about how they can be used in survey research, including classifying open-ended survey responses. Due to their linguistic capacities, it is possible that LLMs are an efficient alternative to time-consuming manual coding and the pre-training of supervised machine learning models. As most existing research on this topic has focused on English-language responses relating to non-complex topics or on single LLMs, it is unclear whether its findings generalize and how the quality of these classifications compares to established methods. In this study, we investigate to what extent different LLMs can be used to code open-ended survey responses in other contexts, using German data on reasons for survey participation as an example. We compare several state-of-the-art LLMs and several prompting approaches, and evaluate the LLMs’ performance by using human expert codings. Overall performance differs greatly between LLMs, and only a fine-tuned LLM achieves satisfactory levels of predictive performance. Performance differences between prompting approaches are conditional on the LLM used. Finally, LLMs’ unequal classification performance across different categories of reasons for survey participation results in different categorical distributions when not using fine-tuning. We discuss the implications of these findings, both for methodological research on coding open-ended responses and for their substantive analysis, and for practitioners processing or substantively analyzing such data. Finally, we highlight the many trade-offs researchers need to consider when choosing automated methods for open-ended response classification in the age of LLMs. In doing so, our study contributes to the growing body of research about the conditions under which LLMs can be efficiently, accurately, and reliably leveraged in survey research.

arxiv情報

著者	Leah von der Heyde,Anna-Carolina Haensch,Bernd Weiß,Jessica Daikeler
発行日	2025-06-18 09:56:49+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL, cs.CY | コメントを受け付けていません

Accurate and scalable exchange-correlation with deep learning

投稿日: 2025年6月19日作成者: jarxiv

要約

密度汎関数理論（DFT）は、分子と材料の特性を予測するために最も広く使用されている電子構造法です。
DFTは原則として、Schr \ ‘Odinger方程式の正確な再定式化ですが、実用的なアプリケーションは、不明な交換相関（XC）機能の近似に依存しています。
ほとんどの既存のXC機能は、計算効率を犠牲にして精度を向上させる、ますます複雑で手作りされた機能の限られたセットを使用して構築されています。
しかし、現在の近似は、化学的精度で実験室実験の予測モデリングの精度と一般性を達成するものではありません。通常、1 kcal/mol未満のエラーとして定義されます。
この作業では、データから直接学習表現によって高価な手で設計された機能をバイパスする最新の深い学習ベースのXC機能であるSkalaを提示します。
Skalaは、半ローカルDFTに典型的な計算効率を保持しながら、小分子の霧化エネルギーの化学的精度を達成します。
このパフォーマンスは、計算集中的な波動関数ベースの方法を使用して生成された前例のない量の高精度参照データをトレーニングすることにより有効になります。
特に、Skalaは、多様な化学をカバーする追加のトレーニングデータで体系的に改善します。
スカラは、霧化エネルギーを超えて化学に合わせて調整された追加の追加の高精度データを組み込むことにより、半ローカルなDFTを犠牲にして、一般的なメイングループ化学全体で最高のパフォーマンスのハイブリッド機能と競合する精度を達成します。
トレーニングデータセットが拡大し続けるにつれて、Skalaは第一原理シミュレーションの予測力をさらに強化する態勢を整えています。

要約(オリジナル)

Density Functional Theory (DFT) is the most widely used electronic structure method for predicting the properties of molecules and materials. Although DFT is, in principle, an exact reformulation of the Schr\’odinger equation, practical applications rely on approximations to the unknown exchange-correlation (XC) functional. Most existing XC functionals are constructed using a limited set of increasingly complex, hand-crafted features that improve accuracy at the expense of computational efficiency. Yet, no current approximation achieves the accuracy and generality for predictive modeling of laboratory experiments at chemical accuracy — typically defined as errors below 1 kcal/mol. In this work, we present Skala, a modern deep learning-based XC functional that bypasses expensive hand-designed features by learning representations directly from data. Skala achieves chemical accuracy for atomization energies of small molecules while retaining the computational efficiency typical of semi-local DFT. This performance is enabled by training on an unprecedented volume of high-accuracy reference data generated using computationally intensive wavefunction-based methods. Notably, Skala systematically improves with additional training data covering diverse chemistry. By incorporating a modest amount of additional high-accuracy data tailored to chemistry beyond atomization energies, Skala achieves accuracy competitive with the best-performing hybrid functionals across general main group chemistry, at the cost of semi-local DFT. As the training dataset continues to expand, Skala is poised to further enhance the predictive power of first-principles simulations.

arxiv情報

著者	Giulia Luise,Chin-Wei Huang,Thijs Vogels,Derk P. Kooi,Sebastian Ehlert,Stephanie Lanius,Klaas J. H. Giesbertz,Amir Karton,Deniz Gunceler,Megan Stanley,Wessel P. Bruinsma,Lin Huang,Xinran Wei,José Garrido Torres,Abylay Katbashev,Bálint Máté,Sékou-Oumar Kaba,Roberto Sordillo,Yingrong Chen,David B. Williams-Young,Christopher M. Bishop,Jan Hermann,Rianne van den Berg,Paola Gori-Giorgi
発行日	2025-06-18 08:39:15+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CE, cs.LG, physics.chem-ph, physics.comp-ph | コメントを受け付けていません

Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs

投稿日: 2025年6月19日作成者: jarxiv

要約

効率的で堅牢な推論能力を実現するために、強化学習（RL）を介して最適化された、エンサリの混合物（MOE）ベースの大きな言語モデルを提示します。
27億5,000万のアクティブ化されたパラメーターを備えた168億パラメーターモデルである公開されているLing-Liteモデルに基づいて構築されたこのアプローチは、挑戦的なベンチマーク（例えば、AIME、LiveCodeBench、GPQA-Diamond）での最先端の（SOTA）小規模な推論モデルのパフォーマンスと一致します。
これを達成するために、RLと蒸留を統合する共同トレーニングパイプラインを導入し、MOE RLトレーニングにおける文書化されていない課題を明らかにします。
まず、RLトレーニング中の最適化の不安定性を特定し、トレーニングの安定性を高め、アルゴリズムシステムの共同設計方法を介して計算スループットを改善する新しいアプローチである、制約付きコンテキスト計算ポリシー最適化（C3PO）を提案します。
第二に、検証メトリックではなく、RLトレーニングのエントロピー損失に基づいて蒸留チェックポイントを選択すると、その後のRLトレーニングで優れたパフォーマンス効率のトレードオフが生じることを経験的に実証します。
最後に、マルチドメインデータ統合を調和させるための2段階のトレーニングパラダイムを開発し、混合データセットでのトレーニングで生じるドメインの競合に対処します。
モデル、データセット、およびコードをリリースします。

要約(オリジナル)

We present Ring-lite, a Mixture-of-Experts (MoE)-based large language model optimized via reinforcement learning (RL) to achieve efficient and robust reasoning capabilities. Built upon the publicly available Ling-lite model, a 16.8 billion parameter model with 2.75 billion activated parameters, our approach matches the performance of state-of-the-art (SOTA) small-scale reasoning models on challenging benchmarks (e.g., AIME, LiveCodeBench, GPQA-Diamond) while activating only one-third of the parameters required by comparable models. To accomplish this, we introduce a joint training pipeline integrating distillation with RL, revealing undocumented challenges in MoE RL training. First, we identify optimization instability during RL training, and we propose Constrained Contextual Computation Policy Optimization(C3PO), a novel approach that enhances training stability and improves computational throughput via algorithm-system co-design methodology. Second, we empirically demonstrate that selecting distillation checkpoints based on entropy loss for RL training, rather than validation metrics, yields superior performance-efficiency trade-offs in subsequent RL training. Finally, we develop a two-stage training paradigm to harmonize multi-domain data integration, addressing domain conflicts that arise in training with mixed dataset. We will release the model, dataset, and code.

arxiv情報

著者	Ling Team,Bin Hu,Cai Chen,Deng Zhao,Ding Liu,Dingnan Jin,Feng Zhu,Hao Dai,Hongzhi Luan,Jia Guo,Jiaming Liu,Jiewei Wu,Jun Mei,Jun Zhou,Junbo Zhao,Junwu Xiong,Kaihong Zhang,Kuan Xu,Lei Liang,Liang Jiang,Liangcheng Fu,Longfei Zheng,Qiang Gao,Qing Cui,Quan Wan,Shaomian Zheng,Shuaicheng Li,Tongkai Yang,Wang Ren,Xiaodong Yan,Xiaopei Wan,Xiaoyun Feng,Xin Zhao,Xinxing Yang,Xinyu Kong,Xuemin Yang,Yang Li,Yingting Wu,Yongkang Liu,Zhankai Xu,Zhenduo Zhang,Zhenglei Zhou,Zhenyu Huang,Zhiqiang Zhang,Zihao Wang,Zujie Wen
発行日	2025-06-18 02:53:14+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL | コメントを受け付けていません

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント