jarxiv | Japanese arxiv | ページ 203

Engagement-Driven Content Generation with Large Language Models

投稿日: 2025年6月5日作成者: jarxiv

要約

大規模な言語モデル（LLMS）は、1対1の相互作用において重要な説得力のある能力を示していますが、相互接続されたユーザーと複雑な意見のダイナミクスがユニークな課題をもたらすソーシャルネットワーク内での影響は依存していないままです。
このペーパーでは、研究の質問に対処します。
このアプローチは、ライブ実験の時間的コストと複雑さをバイパッシングし、LLMと調査中のネットワークの間の効率的なフィードバックループを可能にします。
また、ソーシャルネットワーク内のLLMの位置や、特定のトピックに関する意見の分布などの内生的要因を制御することもできます。
私たちのアプローチは、基礎となるネットワークの意見分布に適応し、プラグアンドプレイコンポーネントとして埋め込まれているエンゲージメントモデルの詳細には不可知論です。
このような柔軟性により、計算社会科学のより複雑なエンゲージメントタスクや介入に適しています。
フレームワークを使用して、さまざまな条件下でソーシャルエンゲージメントを生成するLLMSのパフォーマンスを分析し、このタスクでの潜在能力を最大限に示します。
実験コードは、https://github.com/mminici/engagement-driven-content-generationで公開されています。

要約(オリジナル)

Large Language Models (LLMs) demonstrate significant persuasive capabilities in one-on-one interactions, but their influence within social networks, where interconnected users and complex opinion dynamics pose unique challenges, remains underexplored. This paper addresses the research question: \emph{Can LLMs generate meaningful content that maximizes user engagement on social networks?} To answer this, we propose a pipeline using reinforcement learning with simulated feedback, where the network’s response to LLM-generated content (i.e., the reward) is simulated through a formal engagement model. This approach bypasses the temporal cost and complexity of live experiments, enabling an efficient feedback loop between the LLM and the network under study. It also allows to control over endogenous factors such as the LLM’s position within the social network and the distribution of opinions on a given topic. Our approach is adaptive to the opinion distribution of the underlying network and agnostic to the specifics of the engagement model, which is embedded as a plug-and-play component. Such flexibility makes it suitable for more complex engagement tasks and interventions in computational social science. Using our framework, we analyze the performance of LLMs in generating social engagement under different conditions, showcasing their full potential in this task. The experimental code is publicly available at https://github.com/mminici/Engagement-Driven-Content-Generation.

arxiv情報

著者	Erica Coppolillo,Federico Cinus,Marco Minici,Francesco Bonchi,Giuseppe Manco
発行日	2025-06-04 16:02:28+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.LG | コメントを受け付けていません

Balancing Profit and Fairness in Risk-Based Pricing Markets

投稿日: 2025年6月5日作成者: jarxiv

要約

動的でリスクベースの価格設定は、健康保険や消費者クレジットなどの重要なリソースから脆弱な消費者グループを体系的に排除できます。
私たちは、規制当局が、学習し、解釈可能な税のスケジュールを通じて、社会的目的で私的なインセンティブを再編成できることを示します。
まず、各企業の\ emph {local}人口統計ギャップを暗黙的にバウンドし、\ emph {global}オプトアウトの格差を暗黙的にバウンドし、企業レベルの罰則を動機づけるという正式な命題を提供します。
この洞察に基づいて、\ texttt {marketsim}を紹介します。これは、異種消費者と利益最大化企業のオープンソース、スケーラブルなシミュレーター – 補強学習（RL）ソーシャルプランナー（SP）を訓練します。
したがって、学んだポリシーは透明性があり、簡単に解釈できます。
経験的に調整された2つの市場、すなわち米国の健康保険と消費者creditで、当社のプランナーは、明示的な調整なしに社会福祉の観点から固定された線形スケジュールを上回りながら、規制されていない自由市場と比較して最大16ドル\％$を同時に引き上げます。
これらの結果は、AIアシストされた規制が競争力のある社会的ジレンマをWin-Winの均衡に変換する方法を示しており、公平性を認識した市場監視の原則的かつ実用的な枠組みを提供します。

要約(オリジナル)

Dynamic, risk-based pricing can systematically exclude vulnerable consumer groups from essential resources such as health insurance and consumer credit. We show that a regulator can realign private incentives with social objectives through a learned, interpretable tax schedule. First, we provide a formal proposition that bounding each firm’s \emph{local} demographic gap implicitly bounds the \emph{global} opt-out disparity, motivating firm-level penalties. Building on this insight we introduce \texttt{MarketSim} — an open-source, scalable simulator of heterogeneous consumers and profit-maximizing firms — and train a reinforcement learning (RL) social planner (SP) that selects a bracketed fairness-tax while remaining close to a simple linear prior via an $\mathcal{L}_1$ regularizer. The learned policy is thus both transparent and easily interpretable. In two empirically calibrated markets, i.e., U.S. health-insurance and consumer-credit, our planner simultaneously raises demand-fairness by up to $16\%$ relative to unregulated Free Market while outperforming a fixed linear schedule in terms of social welfare without explicit coordination. These results illustrate how AI-assisted regulation can convert a competitive social dilemma into a win-win equilibrium, providing a principled and practical framework for fairness-aware market oversight.

arxiv情報

著者	Jesse Thibodeau,Hadi Nekoei,Afaf Taïk,Janarthanan Rajendran,Golnoosh Farnadi
発行日	2025-06-04 16:06:36+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.LG, econ.GN, q-fin.EC | コメントを受け付けていません

CLAIM: An Intent-Driven Multi-Agent Framework for Analyzing Manipulation in Courtroom Dialogues

投稿日: 2025年6月5日作成者: jarxiv

要約

法廷は、命が決定され、運命が封印される場所であるが、操作は不浸透ではない。
法的専門用語での操作の戦略的使用は、裁判官の意見を揺さぶり、決定に影響を与える可能性があります。
NLPの進歩の高まりにもかかわらず、法的領域内での操作の検出と分析への応用は、ほとんど未踏のままです。
私たちの仕事は、操作検出、一次マニピュレーターの識別、および操作技術の分類にラベル付けされた1,063の注釈付き法廷会話のデータセットであるLegalConを導入することにより、このギャップに対処します。
さらに、コンテキスト対応と情報に基づいた意思決定を可能にすることにより、操作分析を強化するために設計された2段階の意図駆動型マルチエージェントフレームワークである主張を提案します。
私たちの結果は、司法プロセスの公平性と透明性を改善するために、エージェントのフレームワークを組み込む可能性を強調しています。
これが、法的談話分析におけるNLPのより広範な適用と、法的意思決定の公平性をサポートするための堅牢なツールの開発に貢献することを願っています。
私たちのコードとデータは、https：//github.com/disha1001/claimで入手できます。

要約(オリジナル)

Courtrooms are places where lives are determined and fates are sealed, yet they are not impervious to manipulation. Strategic use of manipulation in legal jargon can sway the opinions of judges and affect the decisions. Despite the growing advancements in NLP, its application in detecting and analyzing manipulation within the legal domain remains largely unexplored. Our work addresses this gap by introducing LegalCon, a dataset of 1,063 annotated courtroom conversations labeled for manipulation detection, identification of primary manipulators, and classification of manipulative techniques, with a focus on long conversations. Furthermore, we propose CLAIM, a two-stage, Intent-driven Multi-agent framework designed to enhance manipulation analysis by enabling context-aware and informed decision-making. Our results highlight the potential of incorporating agentic frameworks to improve fairness and transparency in judicial processes. We hope that this contributes to the broader application of NLP in legal discourse analysis and the development of robust tools to support fairness in legal decision-making. Our code and data are available at https://github.com/Disha1001/CLAIM.

arxiv情報

著者	Disha Sheshanarayana,Tanishka Magar,Ayushi Mittal,Neelam Chaplot
発行日	2025-06-04 16:22:59+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL, cs.LG | コメントを受け付けていません

Plant Bioelectric Early Warning Systems: A Five-Year Investigation into Human-Plant Electromagnetic Communication

投稿日: 2025年6月5日作成者: jarxiv

要約

私たちは、5年間の体系的な研究に基づいて、人間の存在と感情状態に対する植物の生体電気反応に関する包括的な調査を提示します。
カスタム製の植物センサーと機械学習分類を使用して、植物が人間の近接、感情状態、および生理学的状態と相関する明確な生体電気信号を生成することを実証します。
ResNet50アーキテクチャに基づく深い学習モデルは、植物電圧スペクトログラムを介して人間の感情状態を分類する際に97％の精度を達成しましたが、シャッフルラベルを持つコントロールモデルは30％の精度しか達成されませんでした。
この研究では、2020年から2025年にかけての複数の実験からの発見を統合します。これには、個々の認識（66％の精度）、eurythmicジェスチャーの検出、ストレス予測、人間の声と動きに対する反応が含まれます。
これらの現象は、植物が物理的接触の前に生体電界変化を通じて動物に近づく動物を検出する進化した抗heervivivory早期警告システムを表していることを提案します。
私たちの結果は、植物の感覚能力の従来の理解に挑戦し、農業、ヘルスケア、および人間と植物の相互作用研究における実用的な応用を提案しています。

要約(オリジナル)

We present a comprehensive investigation into plant bioelectric responses to human presence and emotional states, building on five years of systematic research. Using custom-built plant sensors and machine learning classification, we demonstrate that plants generate distinct bioelectric signals correlating with human proximity, emotional states, and physiological conditions. A deep learning model based on ResNet50 architecture achieved 97% accuracy in classifying human emotional states through plant voltage spectrograms, while control models with shuffled labels achieved only 30% accuracy. This study synthesizes findings from multiple experiments spanning 2020-2025, including individual recognition (66% accuracy), eurythmic gesture detection, stress prediction, and responses to human voice and movement. We propose that these phenomena represent evolved anti-herbivory early warning systems, where plants detect approaching animals through bioelectric field changes before physical contact. Our results challenge conventional understanding of plant sensory capabilities and suggest practical applications in agriculture, healthcare, and human-plant interaction research.

arxiv情報

著者	Peter A. Gloor
発行日	2025-06-04 16:23:06+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, q-bio.OT | コメントを受け付けていません

TRiSM for Agentic AI: A Review of Trust, Risk, and Security Management in LLM-based Agentic Multi-Agent Systems

投稿日: 2025年6月5日作成者: jarxiv

要約

大規模な言語モデル（LLM）に基づいて構築され、マルチエージェント構成で展開されたエージェントAIシステムは、企業および社会的領域全体のインテリジェントな自律性、コラボレーション、意思決定を再定義しています。
このレビューでは、LLMベースのエージェントマルチエージェントシステム（AMAS）のコンテキストで、信頼、リスク、およびセキュリティ管理（TRISM）の構造化分析を提示します。
まず、エージェントAIの概念的基盤、従来のAIエージェントとのアーキテクチャの違い、およびスケーラブルなツール使用の自律性を可能にする新しいシステム設計を調べることから始めます。
エージェントAIフレームワークのTRISMは、エージェントLLMのためにコンテキスト化された4つの柱ガバナンス、説明可能性、モデルOps、およびプライバシー/セキュリティを通じて詳細に説明されています。
一意の脅威ベクトルを特定し、実際の脆弱性を示すケーススタディでサポートされているエージェントAIアプリケーションの包括的なリスク分類法を導入します。
さらに、このペーパーでは、分散LLMエージェントシステムにおける信頼構築メカニズム、透明性と監視手法、および最先端の説明戦略も調査します。
さらに、信頼、解釈可能性、および人間中心のパフォーマンスを評価するためのメトリックが、オープンベンチマークの課題とともにレビューされます。
セキュリティとプライバシーは、暗号化、敵対的防衛、および進化するAI規制の遵守を通じて対処されます。
この論文は、責任あるエージェントAIのロードマップで締めくくり、安全、説明責任、および透明な展開のための堅牢なトリスム原理を備えた新しいマルチエージェントシステムを整合するための研究の指示を提案します。

要約(オリジナル)

Agentic AI systems, built on large language models (LLMs) and deployed in multi-agent configurations, are redefining intelligent autonomy, collaboration and decision-making across enterprise and societal domains. This review presents a structured analysis of Trust, Risk, and Security Management (TRiSM) in the context of LLM-based agentic multi-agent systems (AMAS). We begin by examining the conceptual foundations of agentic AI, its architectural differences from traditional AI agents, and the emerging system designs that enable scalable, tool-using autonomy. The TRiSM in the agentic AI framework is then detailed through four pillars governance, explainability, ModelOps, and privacy/security each contextualized for agentic LLMs. We identify unique threat vectors and introduce a comprehensive risk taxonomy for the agentic AI applications, supported by case studies illustrating real-world vulnerabilities. Furthermore, the paper also surveys trust-building mechanisms, transparency and oversight techniques, and state-of-the-art explainability strategies in distributed LLM agent systems. Additionally, metrics for evaluating trust, interpretability, and human-centered performance are reviewed alongside open benchmarking challenges. Security and privacy are addressed through encryption, adversarial defense, and compliance with evolving AI regulations. The paper concludes with a roadmap for responsible agentic AI, proposing research directions to align emerging multi-agent systems with robust TRiSM principles for safe, accountable, and transparent deployment.

arxiv情報

著者	Shaina Raza,Ranjan Sapkota,Manoj Karkee,Christos Emmanouilidis
発行日	2025-06-04 16:26:11+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI | コメントを受け付けていません

macOSWorld: A Multilingual Interactive Benchmark for GUI Agents

投稿日: 2025年6月5日作成者: jarxiv

要約

グラフィカルユーザーインターフェイス（GUI）エージェントは、コンピューター使用タスクを自動化し、アクセシビリティを促進するための有望な機能を示していますが、既存のインタラクティブなベンチマークは、ほとんどが英語のみであり、Web使用またはWindows、Linux、およびAndroid環境をカバーしますが、Macosではありません。
MacOSは、独特のGUIパターンと排他的アプリケーションを備えた主要なOSです。
ギャップを橋渡しするために、MacosWorldを紹介します。これは、MacOSのGUIエージェントを評価するための最初の包括的なベンチマークです。
MacosWorldは、30のアプリケーション（28のMacos専用）にわたって202の多言語インタラクティブなタスクを備えており、タスク命令とOSインターフェイスは5つの言語（英語、中国語、アラビア語、日本、ロシア語）で提供されています。
GUIエージェントは欺ception攻撃に対して脆弱であることが示されているため、Macosworldには専用の安全ベンチマークサブセットも含まれています。
6人のGUIエージェントに関する評価は、劇的なギャップを明らかにしています。独自のコンピューター使用エージェントは30％を超える成功率でリードし、オープンソースの軽量研究モデルは2％未満で遅れ、MACOSドメイン適応の必要性を強調しています。
また、多言語のベンチマークは、特にアラビア語では、英語と比較して27.5％の平均劣化を伴う一般的な弱点を明らかにします。
安全ベンチマークの結果は、欺ception攻撃がより一般的であり、即座に注意を要求することを強調しています。
Macosworldはhttps://github.com/showlab/macosworldで入手できます。

要約(オリジナル)

Graphical User Interface (GUI) agents show promising capabilities for automating computer-use tasks and facilitating accessibility, but existing interactive benchmarks are mostly English-only, covering web-use or Windows, Linux, and Android environments, but not macOS. macOS is a major OS with distinctive GUI patterns and exclusive applications. To bridge the gaps, we present macOSWorld, the first comprehensive benchmark for evaluating GUI agents on macOS. macOSWorld features 202 multilingual interactive tasks across 30 applications (28 macOS-exclusive), with task instructions and OS interfaces offered in 5 languages (English, Chinese, Arabic, Japanese, and Russian). As GUI agents are shown to be vulnerable to deception attacks, macOSWorld also includes a dedicated safety benchmarking subset. Our evaluation on six GUI agents reveals a dramatic gap: proprietary computer-use agents lead at above 30% success rate, while open-source lightweight research models lag at below 2%, highlighting the need for macOS domain adaptation. Multilingual benchmarks also expose common weaknesses, especially in Arabic, with a 27.5% average degradation compared to English. Results from safety benchmarking also highlight that deception attacks are more general and demand immediate attention. macOSWorld is available at https://github.com/showlab/macosworld.

arxiv情報

著者	Pei Yang,Hai Ci,Mike Zheng Shou
発行日	2025-06-04 16:26:56+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI | コメントを受け付けていません

Rethinking the Role of Prompting Strategies in LLM Test-Time Scaling: A Perspective of Probability Theory

投稿日: 2025年6月5日作成者: jarxiv

要約

最近、大規模な言語モデル（LLM）でのスケーリングテスト時間コンピューティングが幅広い注目を集めています。
ただし、さまざまな推論促進戦略がスケーリングとしてどのように機能するかについての調査は限られています。
この論文では、標準的で現実的なスケーリング設定である多数決に焦点を当てています。
6 llms $ \ times $ 8のプロンプト戦略$ \ times $ 6ベンチマークで実験を体系的に実施します。
実験結果は、サンプリング時間と計算オーバーヘッドが増加するにつれて、優れた初期パフォーマンスを備えた複雑な促進戦略が徐々に単純なチェーンに遅れをとることを一貫して示しています。
この現象を分析し、理論的な証拠を提供します。
さらに、スケーリングパフォーマンスを効率的に予測し、大きなサンプリング時間の下で最適なプロンプト戦略を特定する確率的方法を提案し、実際のアプリケーションでリソース集約的な推論プロセスの必要性を排除します。
さらに、スケーリングパフォーマンスを大幅に改善するために、理論分析から導き出された2つの方法を紹介します。
私たちの研究が、複雑な促進の役割を再検討し、単純な促進戦略の可能性を解き放ち、テスト時間スケーリングパフォーマンスを強化するための新しい洞察を提供することを促進できることを願っています。
コードはhttps://github.com/mradonkey/rethinking_promptingで入手できます。

要約(オリジナル)

Recently, scaling test-time compute on Large Language Models (LLM) has garnered wide attention. However, there has been limited investigation of how various reasoning prompting strategies perform as scaling. In this paper, we focus on a standard and realistic scaling setting: majority voting. We systematically conduct experiments on 6 LLMs $\times$ 8 prompting strategies $\times$ 6 benchmarks. Experiment results consistently show that as the sampling time and computational overhead increase, complicated prompting strategies with superior initial performance gradually fall behind simple Chain-of-Thought. We analyze this phenomenon and provide theoretical proofs. Additionally, we propose a probabilistic method to efficiently predict scaling performance and identify the best prompting strategy under large sampling times, eliminating the need for resource-intensive inference processes in practical applications. Furthermore, we introduce two ways derived from our theoretical analysis to significantly improve the scaling performance. We hope that our research can promote to re-examine the role of complicated prompting, unleash the potential of simple prompting strategies, and provide new insights for enhancing test-time scaling performance. Code is available at https://github.com/MraDonkey/rethinking_prompting.

arxiv情報

著者	Yexiang Liu,Zekun Li,Zhi Fang,Nan Xu,Ran He,Tieniu Tan
発行日	2025-06-04 16:27:57+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL, cs.LG | コメントを受け付けていません

SLAC: Simulation-Pretrained Latent Action Space for Whole-Body Real-World RL

投稿日: 2025年6月5日作成者: jarxiv

要約

有能な家庭用および産業ロボットを建設するには、モバイルマニピュレーターなどの汎用性の高い高級（DOF）システムの制御を習得する必要があります。
強化学習（RL）は、ロボット制御ポリシーを自律的に取得することを約束していますが、それを高ドフの実施形態に拡大することは依然として困難です。
現実世界の直接RLには、安全な探索と高いサンプル効率の両方が必要であり、実際には達成することは困難です。
一方、SIMからリアルのRLは、現実のギャップのためにしばしば脆くなります。
このペーパーでは、SLACを紹介します。これは、タスクに依存しない潜在的なアクション空間を前処理するために低忠実度シミュレーターを活用することにより、複雑な実施形態を実現可能にする方法を実現可能にする方法です。
SLACは、時間的な抽象化、解体、および安全性を促進するために設計されたカスタマイズされた監視されていないスキル発見方法を介して、この潜在的なアクションスペースを訓練し、それにより効率的な下流の学習を促進します。
潜在的なアクションスペースが学習されると、SLACはそれを新しいポリティオフポリティRLアルゴリズムのアクションインターフェイスとして使用して、実際の相互作用を通じて下流のタスクを自律的に学習します。
一連の2つのモバイル操作タスクのスイートで既存の方法に対してSLACを評価し、最先端のパフォーマンスを実現します。
特に、SLACは、デモンストレーションや手作りの動作前に頼ることなく、実際の相互作用の1時間未満で接触豊富な全身タスクを学習します。
Robo-Rl.github.ioの詳細、コード、ビデオ

要約(オリジナル)

Building capable household and industrial robots requires mastering the control of versatile, high-degree-of-freedom (DoF) systems such as mobile manipulators. While reinforcement learning (RL) holds promise for autonomously acquiring robot control policies, scaling it to high-DoF embodiments remains challenging. Direct RL in the real world demands both safe exploration and high sample efficiency, which are difficult to achieve in practice. Sim-to-real RL, on the other hand, is often brittle due to the reality gap. This paper introduces SLAC, a method that renders real-world RL feasible for complex embodiments by leveraging a low-fidelity simulator to pretrain a task-agnostic latent action space. SLAC trains this latent action space via a customized unsupervised skill discovery method designed to promote temporal abstraction, disentanglement, and safety, thereby facilitating efficient downstream learning. Once a latent action space is learned, SLAC uses it as the action interface for a novel off-policy RL algorithm to autonomously learn downstream tasks through real-world interactions. We evaluate SLAC against existing methods on a suite of bimanual mobile manipulation tasks, where it achieves state-of-the-art performance. Notably, SLAC learns contact-rich whole-body tasks in under an hour of real-world interactions, without relying on any demonstrations or hand-crafted behavior priors. More information, code, and videos at robo-rl.github.io

arxiv情報

著者	Jiaheng Hu,Peter Stone,Roberto Martín-Martín
発行日	2025-06-04 16:41:55+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.LG, cs.RO | コメントを受け付けていません

Recover Experimental Data with Selection Bias using Counterfactual Logic

投稿日: 2025年6月5日作成者: jarxiv

要約

特定のサンプルの体系的な包含または除外から生じる選択バイアスは、因果推論の妥当性に大きな課題をもたらします。
一方、Bareinboim et al。
部分的な外部情報、バックドア調整の複雑さ、および観察データへの強い依存により、多くの実際の設定での適用性を制限して、バイアスデータから偏った観察および介入分布を回復するための導入方法が導入されました。
この論文では、実験データを使用した選択バイアスの下で、$ p（y^*_ {x^*}）$の回収可能性を正式に発見します。
構造因果モデル（SCM）を介して反事実的な世界を明示的に構築することにより、観察世界の選択メカニズムが反事実的なドメインにどのように伝播するかを分析します。
グラフィカルおよび理論的基準の完全なセットを導き出し、実験的分布が選択バイアスの影響を受けないままであることを判断します。
さらに、偏った実験データセットから$ p（y^*_ {x^*}）$を回復するために、部分的に偏りのない観察データを活用するための原則的な方法を提案します。
シミュレーション研究現実的な研究シナリオを複製することは、私たちのアプローチの実用的な有用性を示しており、応用された因果推論における選択バイアスを緩和するための具体的なガイダンスを提供します。

要約(オリジナル)

Selection bias, arising from the systematic inclusion or exclusion of certain samples, poses a significant challenge to the validity of causal inference. While Bareinboim et al. introduced methods for recovering unbiased observational and interventional distributions from biased data using partial external information, the complexity of the backdoor adjustment and the method’s strong reliance on observational data limit its applicability in many practical settings. In this paper, we formally discover the recoverability of $P(Y^*_{x^*})$ under selection bias with experimental data. By explicitly constructing counterfactual worlds via Structural Causal Models (SCMs), we analyze how selection mechanisms in the observational world propagate to the counterfactual domain. We derive a complete set of graphical and theoretical criteria to determine that the experimental distribution remain unaffected by selection bias. Furthermore, we propose principled methods for leveraging partially unbiased observational data to recover $P(Y^*_{x^*})$ from biased experimental datasets. Simulation studies replicating realistic research scenarios demonstrate the practical utility of our approach, offering concrete guidance for mitigating selection bias in applied causal inference.

arxiv情報

著者	Jingyang He,Shuai Wang,Ang Li
発行日	2025-06-04 17:00:31+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, stat.ME | コメントを受け付けていません

Horizon Reduction Makes RL Scalable

投稿日: 2025年6月5日作成者: jarxiv

要約

この作業では、オフライン強化学習（RL）アルゴリズムのスケーラビリティを研究します。
原則として、本当にスケーラブルなオフラインRLアルゴリズムは、十分なデータ、計算、およびモデル容量を考慮して、その複雑さに関係なく、特定の問題を解決できるはずです。
一般的なオフラインのRLアルゴリズムが、一般的なオフラインRLデータセットよりも最大1000倍のデータセットを使用して、多様で挑戦的で未解決のタスクに関するこの約束と一致するかどうか、どのように調査します。
データのスケールアップにもかかわらず、多くの既存のオフラインRLアルゴリズムは、最大パフォーマンスをはるかに下回るスケーリング動作が不十分であることがわかります。
地平線は、オフラインRLのスケーリングの悪い背後にある主な原因であると仮定します。
いくつかの分析実験を通じてこの仮説を経験的に検証し、長い視野が実際にオフラインRLをスケーリングするための基本的な障壁を示していることを示しています。
次に、さまざまな地平線削減技術が、挑戦的なタスクのスケーラビリティを大幅に向上させることを示します。
洞察に基づいて、ホライズンを効果的に削減するSharsaという名前の最小限のスケーラブルな方法も導入します。
Sharsaは、評価方法の中で最も漸近的なパフォーマンスとスケーリング動作を達成し、Horizonを明示的に減らすとオフラインRLのスケーラビリティが解き放たれることを示しています。
コード：https：//github.com/seohongpark/horizon-reduction

要約(オリジナル)

In this work, we study the scalability of offline reinforcement learning (RL) algorithms. In principle, a truly scalable offline RL algorithm should be able to solve any given problem, regardless of its complexity, given sufficient data, compute, and model capacity. We investigate if and how current offline RL algorithms match up to this promise on diverse, challenging, previously unsolved tasks, using datasets up to 1000x larger than typical offline RL datasets. We observe that despite scaling up data, many existing offline RL algorithms exhibit poor scaling behavior, saturating well below the maximum performance. We hypothesize that the horizon is the main cause behind the poor scaling of offline RL. We empirically verify this hypothesis through several analysis experiments, showing that long horizons indeed present a fundamental barrier to scaling up offline RL. We then show that various horizon reduction techniques substantially enhance scalability on challenging tasks. Based on our insights, we also introduce a minimal yet scalable method named SHARSA that effectively reduces the horizon. SHARSA achieves the best asymptotic performance and scaling behavior among our evaluation methods, showing that explicitly reducing the horizon unlocks the scalability of offline RL. Code: https://github.com/seohongpark/horizon-reduction

arxiv情報

著者	Seohong Park,Kevin Frans,Deepinder Mann,Benjamin Eysenbach,Aviral Kumar,Sergey Levine
発行日	2025-06-04 17:06:54+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.LG | コメントを受け付けていません

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント