jarxiv | Japanese arxiv | ページ 543

Benchmarking of CPU-intensive Stream Data Processing in The Edge Computing Systems

投稿日: 2025年5月13日作成者: jarxiv

要約

エッジコンピューティングは重要な技術として浮上しており、低レイテンシ、データセキュリティの強化、集中クラウドインフラストラクチャへの依存度などの大きな利点を提供しています。
これらの利点は、リアルタイムのデータ処理または厳格なセキュリティ対策を必要とするアプリケーションにとって非常に重要です。
これらの利点にもかかわらず、エッジクラスター内で動作するエッジデバイスはしばしば十分に活用されていません。
この非効率性は、主に、特定のワークロードの目的のシステム構成を動的に調整するのに役立つ全体的なパフォーマンスプロファイリングメカニズムがないためです。
エッジコンピューティング環境には、CPU頻度、消費電力、アプリケーションのパフォーマンスの間の複雑な相互作用が含まれるため、これらの相関をより深く理解することが不可欠です。
これらの関係を明らかにすることで、計算効率と省エネの両方を強化する情報に基づいた決定を下すことが可能になります。
このギャップに対処するために、このホワイトペーパーでは、ワークロードサイズとCPU周波数を変化させることにより、合成マイクロベンチマークを使用して、エッジクラスター内の単一の処理ノードの電力消費とパフォーマンス特性を評価します。
結果は、パフォーマンスと消費電力の両方を考えると、最適な尺度がエッジリソースの最適化された使用にどのようにつながるかを示しています。

要約(オリジナル)

Edge computing has emerged as a pivotal technology, offering significant advantages such as low latency, enhanced data security, and reduced reliance on centralized cloud infrastructure. These benefits are crucial for applications requiring real-time data processing or strict security measures. Despite these advantages, edge devices operating within edge clusters are often underutilized. This inefficiency is mainly due to the absence of a holistic performance profiling mechanism which can help dynamically adjust the desired system configuration for a given workload. Since edge computing environments involve a complex interplay between CPU frequency, power consumption, and application performance, a deeper understanding of these correlations is essential. By uncovering these relationships, it becomes possible to make informed decisions that enhance both computational efficiency and energy savings. To address this gap, this paper evaluates the power consumption and performance characteristics of a single processing node within an edge cluster using a synthetic microbenchmark by varying the workload size and CPU frequency. The results show how an optimal measure can lead to optimized usage of edge resources, given both performance and power consumption.

arxiv情報

著者	Tomasz Szydlo,Viacheslaw Horbanow,Dev Nandan Jha,Shashikant Ilager,Aleksander Slominski,Rajiv Ranjan
発行日	2025-05-12 17:02:02+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.DC | コメントを受け付けていません

Emotion-Gradient Metacognitive RSI (Part I): Theoretical Foundations and Single-Agent Architecture

投稿日: 2025年5月13日作成者: jarxiv

要約

感情勾配のメタ認知的再帰的自己改善（EG-MRSI）フレームワークを提示します。これは、内省的なメタ認知、感情に基づく本質的な動機、および再帰的自己修正を統一された理論システムに統合する新しいアーキテクチャです。
このフレームワークは、正式に境界のあるリスクの下で独自の学習アルゴリズムを上書きすることができます。
Noise-to-Meaning RSI（N2M-RSI）Foundationに基づいて、EG-MRSIは、自信、エラー、斬新さ、累積的な成功に駆動される微分可能な内因性報酬機能を導入します。
この信号は、メタ認知マッピングと、証明可能な安全メカニズムによって制約される自己修正演算子の両方を調節します。
初期エージェントの構成、感情勾配のダイナミクス、およびRSIトリガー条件を正式に定義し、エージェントの開発軌跡を導く補強互換互換の最適化目標を導き出します。
意味密度と意味変換効率は、セマンティック学習の定量化可能なメトリックとして導入され、内部構造と予測的な情報性のギャップを埋めます。
このパートIペーパーでは、EG-MRSIの単一エージェントの理論的基礎を確立します。
将来のパーツは、このフレームワークを拡張して、安全証明書とロールバックプロトコル（パートII）、集合的知能メカニズム（パートIII）、および熱力学および計算制限（パートIV）を含む実現可能性の制約を含めます。
一緒に、EG-MRSIシリーズは、オープンエンドおよび安全なAGIの厳密で拡張可能な基盤を提供します。

要約(オリジナル)

We present the Emotion-Gradient Metacognitive Recursive Self-Improvement (EG-MRSI) framework, a novel architecture that integrates introspective metacognition, emotion-based intrinsic motivation, and recursive self-modification into a unified theoretical system. The framework is explicitly capable of overwriting its own learning algorithm under formally bounded risk. Building upon the Noise-to-Meaning RSI (N2M-RSI) foundation, EG-MRSI introduces a differentiable intrinsic reward function driven by confidence, error, novelty, and cumulative success. This signal regulates both a metacognitive mapping and a self-modification operator constrained by provable safety mechanisms. We formally define the initial agent configuration, emotion-gradient dynamics, and RSI trigger conditions, and derive a reinforcement-compatible optimization objective that guides the agent’s development trajectory. Meaning Density and Meaning Conversion Efficiency are introduced as quantifiable metrics of semantic learning, closing the gap between internal structure and predictive informativeness. This Part I paper establishes the single-agent theoretical foundations of EG-MRSI. Future parts will extend this framework to include safety certificates and rollback protocols (Part II), collective intelligence mechanisms (Part III), and feasibility constraints including thermodynamic and computational limits (Part IV). Together, the EG-MRSI series provides a rigorous, extensible foundation for open-ended and safe AGI.

arxiv情報

著者	Rintaro Ando
発行日	2025-05-12 17:02:47+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.LG, I.2.0 | コメントを受け付けていません

‘I Apologize For Not Understanding Your Policy’: Exploring the Specification and Evaluation of User-Managed Access Control Policies by AI Virtual Assistants

投稿日: 2025年5月13日作成者: jarxiv

要約

人工知能（AI）ベースの仮想アシスタント（VAS）の急速な進化（VAS）など、Google Gemini、ChatGPT、Microsoft Copilot、Highflyer DeepSeekなど、Google Gemini、ChatGpt、Microsoft Copilot、Highflyer Deepseekは、e.g。
エンドユーザー。
ただし、ユーザーが管理したアクセス制御ポリシー（Uマップ）の適切な仕様と評価、エンドユーザーが発行および管理するためにエンドユーザーが管理および管理するルールは、これらのVAS内の機密データとデバイス機能へのアクセスを管理することです。このようなプロセスは、ユーザーの経験に影響を与えることなくセキュリティの脆弱性とプライバシーリークを防ぐために重要です。
この研究は、現在の公開されているVASが異なるシナリオでUマップを効果的に管理できるかどうかについての最初の探索的調査を提供します。
構造化されたテストに非構造化されたテストを実施することにより、そのようなVASの理解を評価し、さまざまなUマップアプローチにおける理解の欠如を明らかにしました。
私たちの研究は、重要な制限を特定するだけでなく、複雑な承認ルールを管理し、動的な変更に適応するためにVASをさらに改善する方法についての貴重な洞察を提供します。

要約(オリジナル)

The rapid evolution of Artificial Intelligence (AI)-based Virtual Assistants (VAs) e.g., Google Gemini, ChatGPT, Microsoft Copilot, and High-Flyer Deepseek has turned them into convenient interfaces for managing emerging technologies such as Smart Homes, Smart Cars, Electronic Health Records, by means of explicit commands,e.g., prompts, which can be even launched via voice, thus providing a very convenient interface for end-users. However, the proper specification and evaluation of User-Managed Access Control Policies (U-MAPs), the rules issued and managed by end-users to govern access to sensitive data and device functionality – within these VAs presents significant challenges, since such a process is crucial for preventing security vulnerabilities and privacy leaks without impacting user experience. This study provides an initial exploratory investigation on whether current publicly-available VAs can manage U-MAPs effectively across differing scenarios. By conducting unstructured to structured tests, we evaluated the comprehension of such VAs, revealing a lack of understanding in varying U-MAP approaches. Our research not only identifies key limitations, but offers valuable insights into how VAs can be further improved to manage complex authorization rules and adapt to dynamic changes.

arxiv情報

著者	Jennifer Mondragon,Carlos Rubio-Medrano,Gael Cruz,Dvijesh Shastri
発行日	2025-05-12 17:03:52+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI | コメントを受け付けていません

Enhancing Code Generation via Bidirectional Comment-Level Mutual Grounding

投稿日: 2025年5月13日作成者: jarxiv

要約

大規模な言語モデル（LLM）は、コード生成における前例のない能力を実証しています。
ただし、LLMで生成されたコードは、特にLLMがこれまで見たことのない複雑なプログラミングタスクの場合、幅広い機能エラーに悩まされています。
最近の研究では、開発者がLLMによって生成された誤ったコードの検査と修正に苦労し、LLMベースのコード生成に対する生産性と信頼を低下させることが多いことが示されています。
コミュニケーションにおける相互接地理論に触発されて、コードコメントを開発者とLLMSが共有理解を確立するための媒体としてレバレッジを活用するインタラクティブなアプローチを提案します。
私たちのアプローチは、編集可能なコメントを介してコードのインターリーニングコード生成、インラインコメント生成、およびコンテキスト化されたユーザーフィードバックにより、生成されたコードを開発者の意図に合わせることにより、反復的な接地を促進します。
2つの一般的なベンチマークでのアプローチを評価し、アプローチが複数の最先端のLLMを大幅に改善したことを実証しました。
さらに、2つのベースラインと比較して12人の参加者を対象としたユーザー調査を実施しました。（1）Github Copilotとの相互作用、および（2）マルチターンプログラム合成と呼ばれるマルチステップコード生成パラダイムとの相互作用。
参加者は、特定のプログラミングタスクを16.7％速く完了し、アプローチを使用する際にタスクの成功率を10.5％改善しました。
どちらの結果も、インタラクティブな精製コードコメントが相互接続の共同の確立を可能にし、より正確なコード生成と開発者の信頼を高めることを示しています。

要約(オリジナル)

Large Language Models (LLMs) have demonstrated unprecedented capability in code generation. However, LLM-generated code is still plagued with a wide range of functional errors, especially for complex programming tasks that LLMs have not seen before. Recent studies have shown that developers often struggle with inspecting and fixing incorrect code generated by LLMs, diminishing their productivity and trust in LLM-based code generation. Inspired by the mutual grounding theory in communication, we propose an interactive approach that leverages code comments as a medium for developers and LLMs to establish a shared understanding. Our approach facilitates iterative grounding by interleaving code generation, inline comment generation, and contextualized user feedback through editable comments to align generated code with developer intent. We evaluated our approach on two popular benchmarks and demonstrated that our approach significantly improved multiple state-of-the-art LLMs, e.g., 17.1% pass@1 improvement for code-davinci-002 on HumanEval. Furthermore, we conducted a user study with 12 participants in comparison to two baselines: (1) interacting with GitHub Copilot, and (2) interacting with a multi-step code generation paradigm called Multi-Turn Program Synthesis. Participants completed the given programming tasks 16.7% faster and with 10.5% improvement in task success rate when using our approach. Both results show that interactively refining code comments enables the collaborative establishment of mutual grounding, leading to more accurate code generation and higher developer confidence.

arxiv情報

著者	Yifeng Di,Tianyi Zhang
発行日	2025-05-12 17:20:30+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL, cs.SE | コメントを受け付けていません

Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving

投稿日: 2025年5月13日作成者: jarxiv

要約

大規模な言語モデル（LLM）は、多くの場合、正確で検証可能な計算を必要とする数学的推論タスクに苦しんでいます。
結果ベースの報酬からの強化学習（RL）はテキストベースの推論を強化しますが、エージェントがコード実行などの外部ツールを活用することを自律的に学習する方法を理解することは重要です。
ツール統合推論の結果に基づいた報酬、Zerotir、トレーニングベースLLMを、監視されたツール使用例なしに数学的問題のPythonコードを自発的に生成および実行するためのRLを調査します。
私たちの中心的な貢献は、RLトレーニングが進むにつれて、主要なメトリックスケールが予想通りにスケールすることを示しています。
具体的には、トレーニングステップの増加が自発コード実行頻度、平均応答長、および最終的なタスク精度の増加につながるという強い正の相関を観察します。
これは、トレーニングに投資された計算努力と、効果的でツール熟成された推論戦略の出現との間の定量化可能な関係を示唆しています。
分離されたコード実行環境を備えた堅牢なフレームワークを実装し、標準のRLアルゴリズムとフレームワークを介した調査結果を検証します。
実験では、Zerotirが挑戦的な数学ベンチマークで非ツールゼロールベースラインを大幅に上回っていることが示されています。
私たちの調査結果は、自律的なツールの使用がどのように獲得され、エージェントRL内のスケールの基本的な理解を提供し、将来の研究のために再現可能なベンチマークを提供します。
コードは\ href {https://github.com/anonymize-author/agentrl} {https://github.com/anonymize-author/agentrl}でリリースされます。

要約(オリジナル)

Large Language Models (LLMs) often struggle with mathematical reasoning tasks requiring precise, verifiable computation. While Reinforcement Learning (RL) from outcome-based rewards enhances text-based reasoning, understanding how agents autonomously learn to leverage external tools like code execution remains crucial. We investigate RL from outcome-based rewards for Tool-Integrated Reasoning, ZeroTIR, training base LLMs to spontaneously generate and execute Python code for mathematical problems without supervised tool-use examples. Our central contribution is we demonstrate that as RL training progresses, key metrics scale predictably. Specifically, we observe strong positive correlations where increased training steps lead to increases in the spontaneous code execution frequency, the average response length, and, critically, the final task accuracy. This suggests a quantifiable relationship between computational effort invested in training and the emergence of effective, tool-augmented reasoning strategies. We implement a robust framework featuring a decoupled code execution environment and validate our findings across standard RL algorithms and frameworks. Experiments show ZeroTIR significantly surpasses non-tool ZeroRL baselines on challenging math benchmarks. Our findings provide a foundational understanding of how autonomous tool use is acquired and scales within Agent RL, offering a reproducible benchmark for future studies. Code is released at \href{https://github.com/Anonymize-Author/AgentRL}{https://github.com/Anonymize-Author/AgentRL}.

arxiv情報

著者	Xinji Mai,Haotian Xu,Xing W,Weinong Wang,Yingying Zhang,Wenqiang Zhang
発行日	2025-05-12 17:23:34+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI | コメントを受け付けていません

Must Read: A Systematic Survey of Computational Persuasion

投稿日: 2025年5月13日作成者: jarxiv

要約

説得はコミュニケーションの基本的な側面であり、日常の会話から政治、マーケティング、法律などのハイステークスシナリオに至るまで、さまざまな文脈を越えた意思決定に影響を与えます。
会話型AIシステムの増加により、説得の範囲が大幅に拡大し、機会とリスクの両方を導入しています。
AI駆動型の説得は、有益なアプリケーションに活用することができますが、操作と非倫理的な影響を通じて脅威をもたらします。
さらに、AIシステムは説得者であるだけでなく、説得の影響を受けやすく、敵対的な攻撃やバイアス強化に対して脆弱になります。
AIに生成された説得力のあるコンテンツの急速な進歩にもかかわらず、説得を効果的にするものについての理解は、本質的に主観的で文脈依存性の性質のために限られたままです。
この調査では、3つの重要な視点を中心に構成された計算説得の包括的な概要を提供します。（1）AIとしてのAIは、AIに生成された説得力のあるコンテンツとそのアプリケーションを調査します。
（2）AIが影響と操作に対するAIの感受性を調べる説得力としてのAI。
（3）説得力のある戦略の評価、操作の検出、倫理的説得の確保におけるAIの役割を分析する説得裁判官としてのAI。
計算説得の研究のための分類法を紹介し、説得力の評価、操作的説得の緩和、責任あるAI主導の説得力のあるシステムの開発など、重要な課題について議論します。
私たちの調査では、ますます能力のある言語モデルによってもたらされるリスクに対処しながら、AI駆動の説得の安全性、公平性、および有効性を高めるために、将来の研究の方向性を概説しています。

要約(オリジナル)

Persuasion is a fundamental aspect of communication, influencing decision-making across diverse contexts, from everyday conversations to high-stakes scenarios such as politics, marketing, and law. The rise of conversational AI systems has significantly expanded the scope of persuasion, introducing both opportunities and risks. AI-driven persuasion can be leveraged for beneficial applications, but also poses threats through manipulation and unethical influence. Moreover, AI systems are not only persuaders, but also susceptible to persuasion, making them vulnerable to adversarial attacks and bias reinforcement. Despite rapid advancements in AI-generated persuasive content, our understanding of what makes persuasion effective remains limited due to its inherently subjective and context-dependent nature. In this survey, we provide a comprehensive overview of computational persuasion, structured around three key perspectives: (1) AI as a Persuader, which explores AI-generated persuasive content and its applications; (2) AI as a Persuadee, which examines AI’s susceptibility to influence and manipulation; and (3) AI as a Persuasion Judge, which analyzes AI’s role in evaluating persuasive strategies, detecting manipulation, and ensuring ethical persuasion. We introduce a taxonomy for computational persuasion research and discuss key challenges, including evaluating persuasiveness, mitigating manipulative persuasion, and developing responsible AI-driven persuasive systems. Our survey outlines future research directions to enhance the safety, fairness, and effectiveness of AI-powered persuasion while addressing the risks posed by increasingly capable language models.

arxiv情報

著者	Nimet Beyza Bozdag,Shuhaib Mehri,Xiaocheng Yang,Hyeonjeong Ha,Zirui Cheng,Esin Durmus,Jiaxuan You,Heng Ji,Gokhan Tur,Dilek Hakkani-Tür
発行日	2025-05-12 17:26:31+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL, cs.CY | コメントを受け付けていません

A Comparative Study on Dynamic Graph Embedding based on Mamba and Transformers

投稿日: 2025年5月13日作成者: jarxiv

要約

動的グラフ埋め込みは、多様なドメイン全体で複雑な時間発生ネットワークをモデル化するための重要な手法として浮上しています。
トランスベースのモデルは、時間グラフデータの長距離依存関係をキャプチャすることで有望を示していますが、二次計算の複雑さによりスケーラビリティの課題に直面しています。
この研究では、変圧器と最近提案されたMAMBAアーキテクチャを使用した動的グラフ埋め込みアプローチの比較分析を示しています。これは、線形複雑さのある状態空間モデルです。
3つの新しいモデルを紹介します。グラフ畳み込みネットワークを使用したTransformERG2G Augment、\ mathcal {dg} -mamba、および\ mathcal {gdg} -mambaグラフ同型ネットワークエッジエッジ逆逆です。
複数のベンチマークデータセットでの実験は、MAMBAベースのモデルがリンク予測タスクで変圧器ベースのアプローチに匹敵するパフォーマンスまたは優れたパフォーマンスを実現しながら、より長いシーケンスで大幅な計算効率の向上を提供することを示しています。
特に、\ mathcal {dg} -mambaバリアントは、SBMなどのより安定したグラフで競争力のあるパフォーマンスを維持しながら、UCI、ビットコイン、リアリティマイニングなど、高い時間的変動性を持つデータセット上の変圧器ベースのモデルを一貫して上回ります。
注意の重みと状態マトリックスの分析を通じて、学習した時間依存性に関する洞察を提供し、複雑な時間パターンをキャプチャするモデルの能力を明らかにします。
状態空間モデルとグラフニューラルネットワークを効果的に組み合わせることにより、私たちの仕事は、以前のアプローチの重要な制限に対処し、効率的な時間グラフ表現学習に関する研究の成長に貢献します。
これらの調査結果は、より大きく、より複雑な現実世界のネットワークへの動的グラフの埋め込みをスケーリングするための有望な方向を提供し、ソーシャルネットワーク分析、財務モデリング、生物システムのダイナミクスなどの分野での新しいアプリケーションを潜在的に可能にします。

要約(オリジナル)

Dynamic graph embedding has emerged as an important technique for modeling complex time-evolving networks across diverse domains. While transformer-based models have shown promise in capturing long-range dependencies in temporal graph data, they face scalability challenges due to quadratic computational complexity. This study presents a comparative analysis of dynamic graph embedding approaches using transformers and the recently proposed Mamba architecture, a state-space model with linear complexity. We introduce three novel models: TransformerG2G augment with graph convolutional networks, \mathcal{DG}-Mamba, and \mathcal{GDG}-Mamba with graph isomorphism network edge convolutions. Our experiments on multiple benchmark datasets demonstrate that Mamba-based models achieve comparable or superior performance to transformer-based approaches in link prediction tasks while offering significant computational efficiency gains on longer sequences. Notably, \mathcal{DG}-Mamba variants consistently outperform transformer-based models on datasets with high temporal variability, such as UCI, Bitcoin, and Reality Mining, while maintaining competitive performance on more stable graphs like SBM. We provide insights into the learned temporal dependencies through analysis of attention weights and state matrices, revealing the models’ ability to capture complex temporal patterns. By effectively combining state-space models with graph neural networks, our work addresses key limitations of previous approaches and contributes to the growing body of research on efficient temporal graph representation learning. These findings offer promising directions for scaling dynamic graph embedding to larger, more complex real-world networks, potentially enabling new applications in areas such as social network analysis, financial modeling, and biological system dynamics.

arxiv情報

著者	Ashish Parmanand Pandey,Alan John Varghese,Sarang Patil,Mengjia Xu
発行日	2025-05-12 17:41:35+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.LG | コメントを受け付けていません

Overflow Prevention Enhances Long-Context Recurrent LLMs

投稿日: 2025年5月13日作成者: jarxiv

要約

LLMSの最近の傾向は、長いコンテキスト処理効率を改善する再発性亜科モデルを開発しています。
私たちは、固定サイズの再発メモリがパフォーマンスにどのように影響するかに焦点を当てて、主要な大規模な長いコンテキストモデルを調査します。
私たちの実験では、これらのモデルが拡張されたコンテキストのために訓練されている場合でも、長いコンテキストの使用は十分に活用されていないことが明らかになりました。
具体的には、入力の最も関連性の高い部分のみを識別および処理するチャンクベースの推論手順が再発メモリの障害を軽減し、多くの長いコンテストタスクに効果的であることを実証します。
RWKV6-FINCH-7Bは51％です。
驚くべきことに、この単純なアプローチは、挑戦的なロングベンチV2ベンチマークに最先端の結果につながり、同等のサイズの変圧器で競争力のあるパフォーマンスを示しています。
さらに、私たちの調査結果は、単一チャンク戦略がより強力なパフォーマンスを提供するため、おそらくクロスコンテキスト関係を必要とするタスクであっても、再発モデルが長距離依存を本当に活用するかどうかについて疑問を投げかけています。

要約(オリジナル)

A recent trend in LLMs is developing recurrent sub-quadratic models that improve long-context processing efficiency. We investigate leading large long-context models, focusing on how their fixed-size recurrent memory affects their performance. Our experiments reveal that, even when these models are trained for extended contexts, their use of long contexts remains underutilized. Specifically, we demonstrate that a chunk-based inference procedure, which identifies and processes only the most relevant portion of the input can mitigate recurrent memory failures and be effective for many long-context tasks: On LongBench, our method improves the overall performance of Falcon3-Mamba-Inst-7B by 14%, Falcon-Mamba-Inst-7B by 28%, RecurrentGemma-IT-9B by 50%, and RWKV6-Finch-7B by 51%. Surprisingly, this simple approach also leads to state-of-the-art results in the challenging LongBench v2 benchmark, showing competitive performance with equivalent size Transformers. Furthermore, our findings raise questions about whether recurrent models genuinely exploit long-range dependencies, as our single-chunk strategy delivers stronger performance – even in tasks that presumably require cross-context relations.

arxiv情報

著者	Assaf Ben-Kish,Itamar Zimerman,M. Jehanzeb Mirza,James Glass,Leonid Karlinsky,Raja Giryes
発行日	2025-05-12 17:45:05+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.LG | コメントを受け付けていません

Learning Dynamics in Continual Pre-Training for Large Language Models

投稿日: 2025年5月13日作成者: jarxiv

要約

継続的なトレーニング（CPT）は、特定のダウンストリームタスクに強力な基礎モデルを適用するための一般的で効果的な方法となっています。
この作業では、大規模な言語モデルのCPTプロセス全体で学習ダイナミクスを探ります。
各トレーニングステップで一般的なドメインパフォーマンスがどのように進化するかに特に焦点を当て、検証損失を介してドメインのパフォーマンスが測定されます。
CPTの損失曲線は、ある曲線から別の隠された曲線への遷移を根本的に特徴付け、分布シフトと学習率アニーリングの効果を分離することで説明できることが観察されました。
2つの要因を組み合わせたCPTスケーリング法を導き出し、CPTの（継続的な）トレーニングステップおよび学習率（LR）での損失の予測を可能にします。
私たちの定式化は、損失の可能性、ピーク学習率、トレーニングステップ、リプレイ比などを含むCPTのいくつかの重要な要因を包括的に理解しています。さらに、私たちのアプローチは、一般的なパフォーマンスとドメイン固有のパフォーマンスのバランスをとるさまざまなCPT目標にトレーニングハイパーパラメーターをカスタマイズするために適応させることができます。
広範な実験は、私たちのスケーリング法がさまざまなCPTデータセットとトレーニングのハイパーパラメーターにわたって保持されていることを示しています。

要約(オリジナル)

Continual Pre-Training (CPT) has become a popular and effective method to apply strong foundation models to specific downstream tasks. In this work, we explore the learning dynamics throughout the CPT process for large language models. We specifically focus on how general and downstream domain performance evolves at each training step, with domain performance measured via validation losses. We have observed that the CPT loss curve fundamentally characterizes the transition from one curve to another hidden curve, and could be described by decoupling the effects of distribution shift and learning rate annealing. We derive a CPT scaling law that combines the two factors, enabling the prediction of loss at any (continual) training steps and across learning rate schedules (LRS) in CPT. Our formulation presents a comprehensive understanding of several critical factors in CPT, including loss potential, peak learning rate, training steps, replay ratio, etc. Moreover, our approach can be adapted to customize training hyper-parameters to different CPT goals such as balancing general and domain-specific performance. Extensive experiments demonstrate that our scaling law holds across various CPT datasets and training hyper-parameters.

arxiv情報

著者	Xingjin Wang,Howe Tissue,Lu Wang,Linjing Li,Daniel Dajun Zeng
発行日	2025-05-12 17:47:32+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL, cs.LG | コメントを受け付けていません

Improving Trajectory Stitching with Flow Models

投稿日: 2025年5月13日作成者: jarxiv

要約

生成モデルは、複雑な分布のモデリングと指導可能な推論プロセスへの親和性を考えると、軌道計画者として大きな期待を示しています。
以前の作品は、ロボット操作のコンテキストでこれらを正常に適用してきましたが、トレーニングセット内の完全な軌跡として必要なソリューションが存在しない場合、パフォーマンスが低下しました。
これは、ステッチを介して計画できない結果であることを特定し、その後、これを改善するために必要なアーキテクチャとデータセットの選択に対処します。
これに加えて、これらの機能を安定させ、強化するためのトレーニングおよび推論手順への新しい追加を提案します。
分布の境界条件から外れずに計画を生成し、シミュレーションおよび実際のハードウェアでフランカパンダで障害物回避を実行することにより、アプローチの有効性を実証します。
これらの両方のタスクでは、私たちの方法はベースラインよりも大幅に優れているため、最大4倍の障害を回避できます。

要約(オリジナル)

Generative models have shown great promise as trajectory planners, given their affinity to modeling complex distributions and guidable inference process. Previous works have successfully applied these in the context of robotic manipulation but perform poorly when the required solution does not exist as a complete trajectory within the training set. We identify that this is a result of being unable to plan via stitching, and subsequently address the architectural and dataset choices needed to remedy this. On top of this, we propose a novel addition to the training and inference procedures to both stabilize and enhance these capabilities. We demonstrate the efficacy of our approach by generating plans with out of distribution boundary conditions and performing obstacle avoidance on the Franka Panda in simulation and on real hardware. In both of these tasks our method performs significantly better than the baselines and is able to avoid obstacles up to four times as large.

arxiv情報

著者	Reece O’Mahoney,Wanming Yu,Ioannis Havoutis
発行日	2025-05-12 17:50:10+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.LG, cs.RO | コメントを受け付けていません

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント