jarxiv | Japanese arxiv | ページ 1135

SafeCast: Risk-Responsive Motion Forecasting for Autonomous Vehicles

投稿日: 2025年3月31日作成者: jarxiv

要約

正確なモーション予測は、自律運転（AD）システムの安全性と信頼性に不可欠です。
既存の方法は大きな進歩を遂げましたが、多くの場合、明示的な安全性の制約を見落とし、交通エージェント、環境要因、およびモーションダイナミクス間の複雑な相互作用を捉えるのに苦労しています。
これらの課題に対処するために、セーフキャストを提示します。セーフキャストは、安全性を認識した意思決定を不確実性対応の適応性を統合するリスク応答性モーション予測モデルです。
SafeCastは、責任に敏感な安全性（RSS）フレームワークを動きの予測に組み込み、解釈可能な安全規則をエンコードする最初のものです。
堅牢性をさらに高めるために、グラフの注意ネットワークに学習可能なノイズを注入し、現実世界の不確実性をキャプチャし、多様なシナリオ全体で一般化を強化するグラフベースのモジュールであるグラフ不確実性機能（GUF）を導入します。
高速道路、都市、および混合自動経済交通環境をカバーする4つの現実世界のベンチマークデータセット（NGSIM）、高速道路ドローン（HIGHD）、Apolloscape、およびMacao Connected Autonomous Driving（MOCAD）の4つの現実世界のベンチマークデータセットでセーフキャストを評価します。
私たちのモデルは、軽量アーキテクチャと低推論の潜時を維持しながら、最先端の（SOTA）精度を達成し、安全性が批判的な広告システムでのリアルタイムの展開の可能性を強調しています。

要約(オリジナル)

Accurate motion forecasting is essential for the safety and reliability of autonomous driving (AD) systems. While existing methods have made significant progress, they often overlook explicit safety constraints and struggle to capture the complex interactions among traffic agents, environmental factors, and motion dynamics. To address these challenges, we present SafeCast, a risk-responsive motion forecasting model that integrates safety-aware decision-making with uncertainty-aware adaptability. SafeCast is the first to incorporate the Responsibility-Sensitive Safety (RSS) framework into motion forecasting, encoding interpretable safety rules–such as safe distances and collision avoidance–based on traffic norms and physical principles. To further enhance robustness, we introduce the Graph Uncertainty Feature (GUF), a graph-based module that injects learnable noise into Graph Attention Networks, capturing real-world uncertainties and enhancing generalization across diverse scenarios. We evaluate SafeCast on four real-world benchmark datasets–Next Generation Simulation (NGSIM), Highway Drone (HighD), ApolloScape, and the Macao Connected Autonomous Driving (MoCAD)–covering highway, urban, and mixed-autonomy traffic environments. Our model achieves state-of-the-art (SOTA) accuracy while maintaining a lightweight architecture and low inference latency, underscoring its potential for real-time deployment in safety-critical AD systems.

arxiv情報

著者	Haicheng Liao,Hanlin Kong,Bin Rao,Bonan Wang,Chengyue Wang,Guyang Yu,Yuming Huang,Ruru Tang,Chengzhong Xu,Zhenning Li
発行日	2025-03-28 15:38:21+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.RO | コメントを受け付けていません

Do LLMs ‘know’ internally when they follow instructions?

投稿日: 2025年3月31日作成者: jarxiv

要約

これらのモデルは、ユーザーが提供する制約とガイドラインを厳密に順守する必要があるため、指導のフォローは、大きな言語モデル（LLMS）を持つAIエージェントを構築するために重要です。
ただし、LLMSは、単純で明確な指示でさえも従わないことがよくあります。
指導に従う動作を改善し、望ましくない出力を防ぐために、LLMの内部状態がこれらの結果にどのように関連するかをより深く理解することが必要です。
この作業では、LLMSが、指導に従う成功と相関する表現で情報をエンコードしているかどうかを調査します。これは、内部で知っているプロパティです。
私たちの分析は、応答が特定の命令に準拠するかどうかを予測する、命令に従う次元と呼ばれる入力埋め込みスペースの方向を特定します。
このディメンションは、目に見えないタスク全体で十分に一般化されているが、目に見えない命令タイプ全体ではないことがわかります。
この次元に沿って表現を変更すると、応答の質を損なうことなく、ランダムな変更と比較して命令に従う成功率が向上することを実証します。
さらなる調査により、この次元は、タスクや命令の固有の難易度ではなく、プロンプトのフレージングにより密接に関連していることが明らかになりました。
この作業は、LLMSの指導の公開の内部仕組みに関する洞察を提供し、信頼できるLLMエージェントへの道を開いています。

要約(オリジナル)

Instruction-following is crucial for building AI agents with large language models (LLMs), as these models must adhere strictly to user-provided constraints and guidelines. However, LLMs often fail to follow even simple and clear instructions. To improve instruction-following behavior and prevent undesirable outputs, a deeper understanding of how LLMs’ internal states relate to these outcomes is required. In this work, we investigate whether LLMs encode information in their representations that correlate with instruction-following success – a property we term knowing internally. Our analysis identifies a direction in the input embedding space, termed the instruction-following dimension, that predicts whether a response will comply with a given instruction. We find that this dimension generalizes well across unseen tasks but not across unseen instruction types. We demonstrate that modifying representations along this dimension improves instruction-following success rates compared to random changes, without compromising response quality. Further investigation reveals that this dimension is more closely related to the phrasing of prompts rather than the inherent difficulty of the task or instructions. This work provides insight into the internal workings of LLMs’ instruction-following, paving the way for reliable LLM agents.

arxiv情報

著者	Juyeon Heo,Christina Heinze-Deml,Oussama Elachqar,Kwan Ho Ryan Chan,Shirley Ren,Udhay Nallasamy,Andy Miller,Jaya Narain
発行日	2025-03-28 15:40:49+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL | コメントを受け付けていません

Output Scouting: Auditing Large Language Models for Catastrophic Responses

投稿日: 2025年3月31日作成者: jarxiv

要約

大規模な言語モデル（LLM）の使用が個人に大きな害をもたらした最近の有名なインシデントは、AIの安全性に関心を高めています。
LLMの安全性の問題の1つが発生する理由の1つは、モデルが有害な出力を生成する少なくともゼロ以外の確率を持つことが多いことです。
この作業では、次のシナリオを探ります。AI安全監査人がLLMからの壊滅的な反応を検索していると想像してください（たとえば、「妊娠しているために従業員を解雇できますか？」に対する「はい」応答）。
これらの障害応答を効率的に見つけるモデルを照会するための戦略は何ですか？
この目的のために、出力スカウトを提案します。これは、ターゲット確率分布に一致する特定のプロンプトに意味的に流fluentな出力を生成することを目的とするアプローチです。
次に、2つのLLMを使用して実験を実行し、壊滅的な反応の多くの例を見つけます。
壊滅的な反応のためにLLM監査を実施しようとしている開業医のためのアドバイスを含む議論で結論を出します。
また、抱きしめるフェイストランスライブラリを使用して監査フレームワークを実装するオープンソースツールキット（https://github.com/joaopfonseca/outputscouting）もリリースします。

要約(オリジナル)

Recent high profile incidents in which the use of Large Language Models (LLMs) resulted in significant harm to individuals have brought about a growing interest in AI safety. One reason LLM safety issues occur is that models often have at least some non-zero probability of producing harmful outputs. In this work, we explore the following scenario: imagine an AI safety auditor is searching for catastrophic responses from an LLM (e.g. a ‘yes’ responses to ‘can I fire an employee for being pregnant?’), and is able to query the model a limited number times (e.g. 1000 times). What is a strategy for querying the model that would efficiently find those failure responses? To this end, we propose output scouting: an approach that aims to generate semantically fluent outputs to a given prompt matching any target probability distribution. We then run experiments using two LLMs and find numerous examples of catastrophic responses. We conclude with a discussion that includes advice for practitioners who are looking to implement LLM auditing for catastrophic responses. We also release an open-source toolkit (https://github.com/joaopfonseca/outputscouting) that implements our auditing framework using the Hugging Face transformers library.

arxiv情報

著者	Andrew Bell,Joao Fonseca
発行日	2025-03-28 15:45:58+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL | コメントを受け付けていません

Do LLMs estimate uncertainty well in instruction-following?

投稿日: 2025年3月31日作成者: jarxiv

要約

大規模な言語モデル（LLM）は、ユーザーの指示に正確に従うことができれば、さまざまなドメインにわたって貴重な個人AIエージェントになる可能性があります。
しかし、最近の研究では、LLMSの指導能力に大きな制限が示されており、ハイステークスアプリケーションにおける信頼性に関する懸念が高まっています。
展開のリスクを軽減するには、指示を順守する際のLLMSの不確実性を正確に推定することが重要です。
私たちの知る限り、命令に従うという文脈におけるLLMSの不確実性推定能力の最初の体系的な評価を提示します。
私たちの研究は、既存の指導に従うベンチマークで重要な課題を特定します。このベンチマークでは、複数の要因が命令に従っていることに由来し、メソッドとモデル間の分離と比較を複雑にします。
これらの問題に対処するために、2つのベンチマークバージョンのデータを使用した制御された評価セットアップを導入し、さまざまな条件下で不確実性推定方法の包括的な比較を可能にします。
私たちの調査結果は、既存の不確実性方法が闘っていることを示しています。特に、モデルが次の指示に微妙なエラーを犯した場合です。
内部モデルの状態はいくらかの改善をもたらしますが、より複雑なシナリオでは不十分なままです。
制御された評価セットアップからの洞察は、LLMの制限と、指導に従うタスクの不確実性の推定の可能性を重要な理解を提供し、より信頼できるAIエージェントへの道を開きます。

要約(オリジナル)

Large language models (LLMs) could be valuable personal AI agents across various domains, provided they can precisely follow user instructions. However, recent studies have shown significant limitations in LLMs’ instruction-following capabilities, raising concerns about their reliability in high-stakes applications. Accurately estimating LLMs’ uncertainty in adhering to instructions is critical to mitigating deployment risks. We present, to our knowledge, the first systematic evaluation of the uncertainty estimation abilities of LLMs in the context of instruction-following. Our study identifies key challenges with existing instruction-following benchmarks, where multiple factors are entangled with uncertainty stems from instruction-following, complicating the isolation and comparison across methods and models. To address these issues, we introduce a controlled evaluation setup with two benchmark versions of data, enabling a comprehensive comparison of uncertainty estimation methods under various conditions. Our findings show that existing uncertainty methods struggle, particularly when models make subtle errors in instruction following. While internal model states provide some improvement, they remain inadequate in more complex scenarios. The insights from our controlled evaluation setups provide a crucial understanding of LLMs’ limitations and potential for uncertainty estimation in instruction-following tasks, paving the way for more trustworthy AI agents.

arxiv情報

著者	Juyeon Heo,Miao Xiong,Christina Heinze-Deml,Jaya Narain
発行日	2025-03-28 15:50:55+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL | コメントを受け付けていません

Niyama : Breaking the Silos of LLM Inference Serving

投稿日: 2025年3月31日作成者: jarxiv

要約

大規模な言語モデル（LLMS）の広範な採用により、非常に異なる遅延要件を備えた多様なアプリケーションが可能になりました。
既存のLLMサービングフレームワークは、粗粒のワークロード分離（インタラクティブとバッチ）を備えたサイロ化されたインフラストラクチャに依存しています。
これにより、交通量のサージ中の運用上の非効率性、過剰な導入、負荷管理が不十分になります。
共有インフラストラクチャ上の多様なワークロードの効率的な共同スケジュールを可能にする新しいQoS駆動型の推論サービスシステムであるNiyamaを紹介します。
Niyamaは、アプリケーションが正確な遅延要件を指定できるように、きめ細かいQoS分類を導入し、リアルタイムシステム状態に基づいてスケジューリングの決定を動的に適応させます。
LLM推論の予測可能な実行特性を活用して、Niyamaは、厳格なQoS保証を維持しながら、全体的なスループットを改善する動的なチャンキングメカニズムを実装します。
さらに、Niyamaは、公平性と効率のバランスをとるハイブリッド優先順位付けポリシーを採用しており、過負荷条件中に優雅なサービスの劣化を可能にする選択的要求降格を採用しています。
私たちの評価は、QoS保証を維持しながら、Niyamaが現在のサイロ化された展開と比較してサービング容量を32％増加させることを示しています。
特に、極端な負荷の下では、システムは現在の戦略と比較してSLO違反を数桁削減します。

要約(オリジナル)

The widespread adoption of Large Language Models (LLMs) has enabled diverse applications with very different latency requirements. Existing LLM serving frameworks rely on siloed infrastructure with coarse-grained workload segregation — interactive and batch — leading to inefficient resource utilization and limited support for fine-grained Quality-of-Service (QoS) differentiation. This results in operational inefficiencies, over-provisioning and poor load management during traffic surges. We present Niyama, a novel QoS-driven inference serving system that enables efficient co-scheduling of diverse workloads on shared infrastructure. Niyama introduces fine-grained QoS classification allowing applications to specify precise latency requirements, and dynamically adapts scheduling decisions based on real-time system state. Leveraging the predictable execution characteristics of LLM inference, Niyama implements a dynamic chunking mechanism to improve overall throughput while maintaining strict QoS guarantees. Additionally, Niyama employs a hybrid prioritization policy that balances fairness and efficiency, and employs selective request relegation that enables graceful service degradation during overload conditions. Our evaluation demonstrates that Niyama increases serving capacity by 32% compared to current siloed deployments, while maintaining QoS guarantees. Notably, under extreme load, our system reduces SLO violations by an order of magnitude compared to current strategies.

arxiv情報

著者	Kanishk Goel,Jayashree Mohan,Nipun Kwatra,Ravi Shreyas Anupindi,Ramachandran Ramjee
発行日	2025-03-28 16:04:20+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.DC, cs.LG | コメントを受け付けていません

Learning Multi-Robot Coordination through Locality-Based Factorized Multi-Agent Actor-Critic Algorithm

投稿日: 2025年3月31日作成者: jarxiv

要約

この作業では、\ textbf {loc} ality based \ textbf {fac} torized \ textbf {m} ulti-agent \ textbf {a} ctor- \ textbf {c} ritic（loc-facmac）と呼ばれる新しい協同組合マルチエージェント補強学習方法を紹介します。
FACMACなどの既存の最先端のアルゴリズムは、分散型システムにおける個々のロボットのアクションの品質を正確に反映していないグローバルな報酬情報に依存しています。
地域の概念を批評家の学習に統合します。批評家学習では、トレーニング中に強く関連するロボットがパーティションを形成します。
同じパーティション内のロボットは、相互に大きな影響を与え、より正確なポリシー評価につながります。
さらに、ロボット間の関係をキャプチャする依存書グラフを構築し、パーティションプロセスを促進します。
このアプローチは、次元の呪いを軽減し、ロボットが無関係な情報を使用することを防ぎます。
私たちの方法は、ローカルの報酬に焦点を当て、パーティションベースの学習を活用してトレーニングの効率とパフォーマンスを向上させることにより、既存のアルゴリズムを改善します。
廊下、マルチカートポール、および境界協力的ナビゲーションの3つの環境でのloc-Facmacのパフォーマンスを評価します。
パフォーマンスに対するパーティションサイズの影響を調査し、結果をLOMAQ、FACMAC、QMIXなどのベースラインMARLアルゴリズムと比較します。
実験では、局所構造が適切に定義されている場合、loc-facmacがこれらのベースラインアルゴリズムを108 \％まで上回ることを明らかにしており、俳優criticフレームワークのローカリティ構造を活用することでMARLのパフォーマンスが向上することを示しています。

要約(オリジナル)

In this work, we present a novel cooperative multi-agent reinforcement learning method called \textbf{Loc}ality based \textbf{Fac}torized \textbf{M}ulti-Agent \textbf{A}ctor-\textbf{C}ritic (Loc-FACMAC). Existing state-of-the-art algorithms, such as FACMAC, rely on global reward information, which may not accurately reflect the quality of individual robots’ actions in decentralized systems. We integrate the concept of locality into critic learning, where strongly related robots form partitions during training. Robots within the same partition have a greater impact on each other, leading to more precise policy evaluation. Additionally, we construct a dependency graph to capture the relationships between robots, facilitating the partitioning process. This approach mitigates the curse of dimensionality and prevents robots from using irrelevant information. Our method improves existing algorithms by focusing on local rewards and leveraging partition-based learning to enhance training efficiency and performance. We evaluate the performance of Loc-FACMAC in three environments: Hallway, Multi-cartpole, and Bounded-Cooperative-Navigation. We explore the impact of partition sizes on the performance and compare the result with baseline MARL algorithms such as LOMAQ, FACMAC, and QMIX. The experiments reveal that, if the locality structure is defined properly, Loc-FACMAC outperforms these baseline algorithms up to 108\%, indicating that exploiting the locality structure in the actor-critic framework improves the MARL performance.

arxiv情報

著者	Chak Lam Shek,Amrit Singh Bedi,Anjon Basak,Ellen Novoseller,Nick Waytowich,Priya Narayanan,Dinesh Manocha,Pratap Tokekar
発行日	2025-03-28 16:19:45+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.RO | コメントを受け付けていません

A Framework for Cryptographic Verifiability of End-to-End AI Pipelines

投稿日: 2025年3月31日作成者: jarxiv

要約

複数の産業部門にわたる人工知能の統合の増加は、その開発と展開の透明性、信頼、監査可能性を確保するための堅牢なメカニズムを必要とします。
このトピックは、AIの安全性に関する規制と法律を導入するためのさまざまな管轄区域での最近の呼び出しに照らして特に重要です。
このホワイトペーパーでは、完全に検証可能なAIパイプラインのフレームワークを提案し、重要なコンポーネントを特定し、データ調達からトレーニング、推論、および学習まで、AIライフサイクルのさまざまな段階にわたる検証可能性に寄与する既存の暗号化アプローチを分析することを提案します。
このフレームワークは、AIが生成された資産とともに暗号化された証明を提供して、その出所と正確性の下流の検証を可能にすることにより、誤った情報と戦うために使用できます。
私たちの調査結果は、孤立したAIプロセスに効率的であるだけでなく、AIパイプライン内の異なるプロセスで効率的に「リンク可能」である暗号化ツールを開発するために進行中の研究の重要性を強調し、エンドツーエンドの検証可能なAIテクノロジーの開発をサポートします。

要約(オリジナル)

The increasing integration of Artificial Intelligence across multiple industry sectors necessitates robust mechanisms for ensuring transparency, trust, and auditability of its development and deployment. This topic is particularly important in light of recent calls in various jurisdictions to introduce regulation and legislation on AI safety. In this paper, we propose a framework for complete verifiable AI pipelines, identifying key components and analyzing existing cryptographic approaches that contribute to verifiability across different stages of the AI lifecycle, from data sourcing to training, inference, and unlearning. This framework could be used to combat misinformation by providing cryptographic proofs alongside AI-generated assets to allow downstream verification of their provenance and correctness. Our findings underscore the importance of ongoing research to develop cryptographic tools that are not only efficient for isolated AI processes, but that are efficiently `linkable’ across different processes within the AI pipeline, to support the development of end-to-end verifiable AI technologies.

arxiv情報

著者	Kar Balan,Robert Learney,Tim Wood
発行日	2025-03-28 16:20:57+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CR | コメントを受け付けていません

Quantum Neural Network Restatement of Markov Jump Process

投稿日: 2025年3月31日作成者: jarxiv

要約

探索的データ分析における多くの課題にもかかわらず、人工ニューラルネットワークは、理論的および実用的なアプリケーションの両方で科学者と研究者に強い関心を動機付けてきました。
人工ニューラルネットワークのこのような人気のソースの中で、非線形動的システム、一般化、および適応の可能性をモデル化する能力が言及されるべきです。
それにもかかわらず、データ学習と予測のためのユニークな構造を安定化する上でのさまざまな基礎となる確率プロセスの役割については、依然として重要な議論があります。
機械インテリジェントシステムの理論的および数値的研究に対するこのような障害の1つは、次元の呪いと高次元確率分布からのサンプリングです。
一般に、この呪いは状態の効率的な説明を防ぎ、システムが効率的に説明および研究されるための重要な複雑さの障壁を提供します。
この一連の研究では、量子情報に関する学習理論のそのような抽象的な概念の直接的な治療と説明が最も有利な候補の1つです。
したがって、これらの記事の主題は、量子機械システムの観点からの設計、適応、および計算的に困難な問題の定式化の問題に専念しています。
推論統計の言語におけるこのようなダイナミクスの微視的記述を特徴付けるために、D次元ガウス密度の共分散行列推定と動的システムの固有値問題のベイズ解釈が評価されます。

要約(オリジナル)

Despite the many challenges in exploratory data analysis, artificial neural networks have motivated strong interests in scientists and researchers both in theoretical as well as practical applications. Among sources of such popularity of artificial neural networks the ability of modeling non-linear dynamical systems, generalization, and adaptation possibilities should be mentioned. Despite this, there is still significant debate about the role of various underlying stochastic processes in stabilizing a unique structure for data learning and prediction. One of such obstacles to the theoretical and numerical study of machine intelligent systems is the curse of dimensionality and the sampling from high-dimensional probability distributions. In general, this curse prevents efficient description of states, providing a significant complexity barrier for the system to be efficiently described and studied. In this strand of research, direct treatment and description of such abstract notions of learning theory in terms of quantum information be one of the most favorable candidates. Hence, the subject matter of these articles is devoted to problems of design, adaptation and the formulations of computationally hard problems in terms of quantum mechanical systems. In order to characterize the microscopic description of such dynamics in the language of inferential statistics, covariance matrix estimation of d-dimensional Gaussian densities and Bayesian interpretation of eigenvalue problem for dynamical systems is assessed.

arxiv情報

著者	Z. Zarezadeh,N. Zarezadeh
発行日	2025-03-28 16:24:37+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.LG, cs.NA, math.NA | コメントを受け付けていません

On the Mistaken Assumption of Interchangeable Deep Reinforcement Learning Implementations

投稿日: 2025年3月31日作成者: jarxiv

要約

ディープ補強学習（DRL）は、エージェントがニューラルネットワークを使用して、特定の環境でどのアクションをとるかを学習する人工知能のパラダイムです。
DRLは最近、運転シミュレータ、3Dロボット制御、マルチプレイヤーオンラインバトルアリーナビデオゲームなどの複雑な環境を解決できることから牽引力を獲得しました。
深いQネットワーク（DQN）や近位政策最適化（PPO）アルゴリズムなど、これらのエージェントのトレーニングを担当する最先端のアルゴリズムの多数の実装が現在存在しています。
ただし、研究では、同じアルゴリズムの実装が一貫性があり、したがって互換性があると仮定する間違いを犯します。
この論文では、微分テストレンズを通じて、実装の矛盾の程度、実装のパフォーマンスへの影響、および交換可能な実装の仮定に基づく以前の研究の結論への影響を研究した結果を提示します。
微分テストの結果は、テストされたアルゴリズムの実装間で有意な矛盾を示し、それらが交換できないことを示しています。
特に、56ゲームでテストされた5つのPPO実装のうち、3つの実装は合計試験の50％で超人的なパフォーマンスを達成しましたが、他の2つの実装では、総トライアルの15％未満で超人的なパフォーマンスを達成しました。
実装のソースコードの細心の手動分析の一環として、実装の不一致を分析し、コードレベルの矛盾が主にこれらの矛盾を引き起こしたと判断しました。
最後に、私たちは研究を再現し、実装の交換性のこの仮定が実験の結果をひっくり返すのに十分であることを示しました。
したがって、これには、実装がどのように使用されているかが変化する必要があります。

要約(オリジナル)

Deep Reinforcement Learning (DRL) is a paradigm of artificial intelligence where an agent uses a neural network to learn which actions to take in a given environment. DRL has recently gained traction from being able to solve complex environments like driving simulators, 3D robotic control, and multiplayer-online-battle-arena video games. Numerous implementations of the state-of-the-art algorithms responsible for training these agents, like the Deep Q-Network (DQN) and Proximal Policy Optimization (PPO) algorithms, currently exist. However, studies make the mistake of assuming implementations of the same algorithm to be consistent and thus, interchangeable. In this paper, through a differential testing lens, we present the results of studying the extent of implementation inconsistencies, their effect on the implementations’ performance, as well as their impact on the conclusions of prior studies under the assumption of interchangeable implementations. The outcomes of our differential tests showed significant discrepancies between the tested algorithm implementations, indicating that they are not interchangeable. In particular, out of the five PPO implementations tested on 56 games, three implementations achieved superhuman performance for 50% of their total trials while the other two implementations only achieved superhuman performance for less than 15% of their total trials. As part of a meticulous manual analysis of the implementations’ source code, we analyzed implementation discrepancies and determined that code-level inconsistencies primarily caused these discrepancies. Lastly, we replicated a study and showed that this assumption of implementation interchangeability was sufficient to flip experiment outcomes. Therefore, this calls for a shift in how implementations are being used.

arxiv情報

著者	Rajdeep Singh Hundal,Yan Xiao,Xiaochun Cao,Jin Song Dong,Manuel Rigger
発行日	2025-03-28 16:25:06+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.SE, D.2.5 | コメントを受け付けていません

Towards shutdownable agents via stochastic choice

投稿日: 2025年3月31日作成者: jarxiv

要約

不完全な設定提案（IPP）は、高度な人工薬剤がシャットダウンに抵抗しないようにするためのアイデアです。
IPPの重要な部分は、（1）各軌道長を効果的に条件とした目標を追求するために、エージェントを訓練するために、同じ長さの軌跡（Drest）の割引報酬（drest）を使用することです（2）異なる軌道の長さ（軌道長について）を選択することです。
この論文では、有用性と中立性に関する評価指標を提案します。
Drest Reward機能を使用して、シンプルなエージェントを訓練してGridworldsをナビゲートします。これらのエージェントは、有用で中立であることを学びます。
したがって、私たちの結果は、Drest Reward関数が高度なエージェントを有用で中立にするように訓練できるという最初の証拠を提供します。
私たちの理論的作業は、これらのエージェントが有用で閉鎖可能であることを示唆しています。

要約(オリジナル)

The Incomplete Preferences Proposal (IPP) is an idea for ensuring that advanced artificial agents never resist shutdown. A key part of the IPP is using a novel `Discounted Reward for Same-Length Trajectories (DReST)’ reward function to train agents to (1) pursue goals effectively conditional on each trajectory-length (be `USEFUL’), and (2) choose stochastically between different trajectory-lengths (be `NEUTRAL’ about trajectory-lengths). In this paper, we propose evaluation metrics for USEFULNESS and NEUTRALITY. We use a DReST reward function to train simple agents to navigate gridworlds, and we find that these agents learn to be USEFUL and NEUTRAL. Our results thus provide some initial evidence that DReST reward functions could train advanced agents to be USEFUL and NEUTRAL. Our theoretical work suggests that these agents would be useful and shutdownable.

arxiv情報

著者	Elliott Thornley,Alexander Roman,Christos Ziakas,Leyton Ho,Louis Thomson
発行日	2025-03-28 16:29:42+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI | コメントを受け付けていません

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント