jarxiv | Japanese arxiv | ページ 1116

Graph Neural Network-Based Predictive Modeling for Robotic Plaster Printing

投稿日: 2025年4月1日作成者: jarxiv

要約

この作業は、粒子ベースの製造プロセスから得られる表面を予測するために、グラフニューラルネットワーク（GNN）モデリングアプローチを提案します。
後者は、壁にあるセメント質の石膏のスプレーベースの印刷で構成され、ロボットアームを使用して促進されます。
予測は、位置、速度、方向、印刷プロセスパラメーターなどのロボットアーム軌道機能を使用して計算されます。
壁ドメインとエンドエフェクターの粒子表現に基づいた提案されたアプローチは、グラフベースのソリューションの採用を可能にします。
GNNモデルは、エンコーダプロセッサデコーダーアーキテクチャで構成され、臨床検査からのデータを使用してトレーニングされ、ハイパーパラメーターはベイジアンスキームによって最適化されます。
このモデルの目的は、印刷プロセスのシミュレーターとして機能し、最終的にロボットアームの軌跡の生成と印刷パラメーターの最適化に使用されることです。
提案されたモデルのパフォーマンスは、目に見えないグラウンドトゥルースデータに対する予測エラーの観点から評価されます。これは、既存のベンチマークモデルのパフォーマンスと比較して、さまざまなシナリオでの一般性を示しています。
結果は、ベンチマークモデルよりも大幅な改善を示しており、特にパフォーマンスが向上し、予測ステップ全体のエラースケーリングが強化されています。

要約(オリジナル)

This work proposes a Graph Neural Network (GNN) modeling approach to predict the resulting surface from a particle based fabrication process. The latter consists of spray-based printing of cementitious plaster on a wall and is facilitated with the use of a robotic arm. The predictions are computed using the robotic arm trajectory features, such as position, velocity and direction, as well as the printing process parameters. The proposed approach, based on a particle representation of the wall domain and the end effector, allows for the adoption of a graph-based solution. The GNN model consists of an encoder-processor-decoder architecture and is trained using data from laboratory tests, while the hyperparameters are optimized by means of a Bayesian scheme. The aim of this model is to act as a simulator of the printing process, and ultimately used for the generation of the robotic arm trajectory and the optimization of the printing parameters, towards the materialization of an autonomous plastering process. The performance of the proposed model is assessed in terms of the prediction error against unseen ground truth data, which shows its generality in varied scenarios, as well as in comparison with the performance of an existing benchmark model. The results demonstrate a significant improvement over the benchmark model, with notably better performance and enhanced error scaling across prediction steps.

arxiv情報

著者	Diego Machain Rivera,Selen Ercan Jenny,Ping Hsun Tsai,Ena Lloret-Fritschi,Luis Salamanca,Fernando Perez-Cruz,Konstantinos E. Tatsis
発行日	2025-03-31 14:15:00+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CE, cs.LG, cs.RO | コメントを受け付けていません

Backdoor Graph Condensation

投稿日: 2025年4月1日作成者: jarxiv

要約

グラフ凝縮は最近、グラフニューラルネットワーク（GNNS）のトレーニング効率を改善するための一般的な手法として浮上しました。
この小さな合成グラフで訓練されたGNNが大きなグラフで訓練されたGNNに匹敵するパフォーマンスを実現できるように、大きなグラフを小さなグラフに凝縮します。
ただし、既存のグラフ凝縮研究は、主にグラフサイズとGNNSのパフォーマンス（モデルユーティリティ）の間の最高のトレードオフに焦点を当てていますが、グラフ凝縮のセキュリティ問題を見落としています。
このギャップを埋めるために、凝縮グラフで訓練されたGNNに対するバックドア攻撃を最初に探索します。
BGCと呼ばれるグラフ凝縮に対する効果的なバックドア攻撃を導入します。
この攻撃の目的は、（1）トリガーインジェクションにもかかわらず、凝縮されたグラフの品質を保存し、（2）凝縮プロセスを通じてトリガーの有効性を確保し、高い攻撃の成功率を達成することを目指しています。
具体的には、BGCは凝縮中にトリガーを一貫して更新し、中毒の代表的なノードをターゲットにします。
広範な実験は、私たちの攻撃の有効性を示しています。
BGCは、すべての場合に高い攻撃成功率（1.0に近い）と優れたモデルユーティリティを達成します。
さらに、複数の防御方法に対する結果は、彼らの防御下でのBGCの回復力を示しています。
最後に、攻撃のパフォーマンスに影響を与えるキーハイパーパラメーターを分析します。
私たちのコードは、https：//github.com/jiahaowugit/bgcで入手できます。

要約(オリジナル)

Graph condensation has recently emerged as a prevalent technique to improve the training efficiency for graph neural networks (GNNs). It condenses a large graph into a small one such that a GNN trained on this small synthetic graph can achieve comparable performance to a GNN trained on the large graph. However, while existing graph condensation studies mainly focus on the best trade-off between graph size and the GNNs’ performance (model utility), they overlook the security issues of graph condensation. To bridge this gap, we first explore backdoor attack against the GNNs trained on the condensed graphs. We introduce an effective backdoor attack against graph condensation, termed BGC. This attack aims to (1) preserve the condensed graph quality despite trigger injection, and (2) ensure trigger efficacy through the condensation process, achieving a high attack success rate. Specifically, BGC consistently updates triggers during condensation and targets representative nodes for poisoning. Extensive experiments demonstrate the effectiveness of our attack. BGC achieves a high attack success rate (close to 1.0) and good model utility in all cases. Furthermore, the results against multiple defense methods demonstrate BGC’s resilience under their defenses. Finally, we analyze the key hyperparameters that influence the attack performance. Our code is available at: https://github.com/JiahaoWuGit/BGC.

arxiv情報

著者	Jiahao Wu,Ning Lu,Zeiyu Dai,Kun Wang,Wenqi Fan,Shengcai Liu,Qing Li,Ke Tang
発行日	2025-03-31 14:19:20+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CR, cs.LG | コメントを受け付けていません

PhD Knowledge Not Required: A Reasoning Challenge for Large Language Models

投稿日: 2025年4月1日作成者: jarxiv

要約

フロンティアモデルの既存のベンチマークは、多くの場合、専門の「PHDレベル」の知識をテストします。
対照的に、一般的な知識のみを必要とするNPRサンデーパズルチャレンジに基づいて、594の問題を伴うベンチマークを提示します。
私たちのベンチマークは、人間とモデルの両方にとって挑戦的です。
ただし、正しいソリューションは簡単に検証でき、モデルの間違いは簡単に見つけることができます。
LLMは社会でより広く展開されているため、深いドメインの専門知識を必要とせずに人間が理解できるフロンティアモデルのベンチマークを開発することが有用であると考えています。
私たちの作業は、既存のベンチマークでは明らかではない能力のギャップを明らかにしています。OpenaiO1は、専門知識をテストするベンチマークでテストされたときに他のモデルと同等になっているにもかかわらず、ベンチマークの他の推論モデルを大幅に上回ります。
さらに、推論出力の分析により、新しい種類の障害が明らかになります。
たとえば、Deepseek R1は、間違っていることがわかっていることを回答する前に、しばしば「私はあきらめ」と認めます。
また、R1は出力では著しく「不確実」になる可能性があり、まれな場合は「思考を終える」ことはできません。これは、コンテキストウィンドウの制限に達する前にテクニックが「ラップ」する必要があることを示唆しています。
また、推論の有効性を定量化して、より多くの推論がベンチマークの精度を向上させる可能性が低いポイントを特定します。

要約(オリジナル)

Existing benchmarks for frontier models often test specialized, ‘PhD-level’ knowledge that is difficult for non-experts to grasp. In contrast, we present a benchmark with 594 problems based on the NPR Sunday Puzzle Challenge that requires only general knowledge. Our benchmark is challenging for both humans and models; however correct solutions are easy to verify, and models’ mistakes are easy to spot. As LLMs are more widely deployed in society, we believe it is useful to develop benchmarks for frontier models that humans can understand without the need for deep domain expertise. Our work reveals capability gaps that are not evident in existing benchmarks: OpenAI o1 significantly outperforms other reasoning models on our benchmark, despite being on par with other models when tested on benchmarks that test specialized knowledge. Furthermore, our analysis of reasoning outputs uncovers new kinds of failures. DeepSeek R1, for instance, often concedes with ‘I give up’ before providing an answer that it knows is wrong. R1 can also be remarkably ‘uncertain’ in its output and in rare cases, it does not ‘finish thinking,’ which suggests the need for techniques to ‘wrap up’ before the context window limit is reached. We also quantify the effectiveness of reasoning longer to identify the point beyond which more reasoning is unlikely to improve accuracy on our benchmark.

arxiv情報

著者	Zixuan Wu,Francesca Lucchetti,Aleksander Boruch-Gruszecki,Jingmiao Zhao,Carolyn Jane Anderson,Joydeep Biswas,Federico Cassano,Molly Q Feldman,Arjun Guha
発行日	2025-03-31 14:21:49+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.LG | コメントを受け付けていません

Resonance: Drawing from Memories to Imagine Positive Futures through AI-Augmented Journaling

投稿日: 2025年4月1日作成者: jarxiv

要約

人々は本質的に自分の未来を想像しながら、過去の経験を本質的に使用します。これは、メンタルヘルスにおいて重要な役割を果たす能力です。
Resonanceは、ユーザー自身の過去の記憶に基づいた将来のアクティビティのためのAIに生成されたアクション指向の提案を提供することにより、この能力を強化するように設計されたAI駆動のジャーナリングツールです。
新しいメモリがログに記録され、その後にユーザーが提案を実行することを想像するプロンプトが続くと、提案が提供されます。
2週間のランダム化比較試験（n = 55）では、共鳴を使用すると、メンタルヘルスの結果が大幅に改善され、ユーザーのPHQ8スコアが減少し、現在のうつ病の尺度が減り、特に提案に基づいて行動する可能性が高い場合、毎日の肯定的な影響が増加することがわかりました。
特に、提案の有効性は、個人的で斬新で、ユーザーの記録された記憶を参照したときに高かった。
最後に、オープンエンドのフィードバックを通じて、ツールの使用を奨励または妨げた要因について説明します。

要約(オリジナル)

People inherently use experiences of their past while imagining their future, a capability that plays a crucial role in mental health. Resonance is an AI-powered journaling tool designed to augment this ability by offering AI-generated, action-oriented suggestions for future activities based on the user’s own past memories. Suggestions are offered when a new memory is logged and are followed by a prompt for the user to imagine carrying out the suggestion. In a two-week randomized controlled study (N=55), we found that using Resonance significantly improved mental health outcomes, reducing the users’ PHQ8 scores, a measure of current depression, and increasing their daily positive affect, particularly when they would likely act on the suggestion. Notably, the effectiveness of the suggestions was higher when they were personal, novel, and referenced the user’s logged memories. Finally, through open-ended feedback, we discuss the factors that encouraged or hindered the use of the tool.

arxiv情報

著者	Wazeer Zulfikar,Treyden Chiaravalloti,Jocelyn Shen,Rosalind Picard,Pattie Maes
発行日	2025-03-31 14:30:47+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.HC | コメントを受け付けていません

Learning a Canonical Basis of Human Preferences from Binary Ratings

投稿日: 2025年4月1日作成者: jarxiv

要約

生成AIの最近の進歩は、人間のフィードバック（RLHF）からの強化学習などのアライメント技術によって推進されています。
RLHFおよび関連する手法には通常、バイナリまたはランク付けされた選択肢のデータセットを構築し、その後、これらの好みに合わせて微調整されたモデルを構築します。
このペーパーでは、このようなデータセットにエンコードされた好みを理解し、一般的な人間の好みを特定することに焦点を移します。
21の優先カテゴリの小さなサブセット（ほぼ5,000個の異なる好みのセットから選択）が、個人間の優先変動の89％以上をキャプチャします。
この小さな一連の好みは、心理学または顔認識研究の人間の変動を特徴付ける確立された発見と同様に、人間の好みの標準的な基礎に類似しています。
合成評価と経験的評価の両方を通じて、データセット全体および特定のトピック内で、低ランクの標準的な人間の好みが一般化されることを確認します。
さらに、モデル評価における優先ベースのユーティリティを実証します。優先カテゴリでは、モデルのアラインメントに関するより深い洞察とモデルトレーニングで、好みの定義されたサブセットがそれに応じてモデルを正常に整列させることを示します。

要約(オリジナル)

Recent advances in generative AI have been driven by alignment techniques such as reinforcement learning from human feedback (RLHF). RLHF and related techniques typically involve constructing a dataset of binary or ranked choice human preferences and subsequently fine-tuning models to align with these preferences. This paper shifts the focus to understanding the preferences encoded in such datasets and identifying common human preferences. We find that a small subset of 21 preference categories (selected from a set of nearly 5,000 distinct preferences) captures >89% of preference variation across individuals. This small set of preferences is analogous to a canonical basis of human preferences, similar to established findings that characterize human variation in psychology or facial recognition studies. Through both synthetic and empirical evaluations, we confirm that our low-rank, canonical set of human preferences generalizes across the entire dataset and within specific topics. We further demonstrate our preference basis’ utility in model evaluation, where our preference categories offer deeper insights into model alignment, and in model training, where we show that fine-tuning on preference-defined subsets successfully aligns the model accordingly.

arxiv情報

著者	Kailas Vodrahalli,Wei Wei,James Zou
発行日	2025-03-31 14:35:48+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.HC, cs.LG | コメントを受け付けていません

Concept Navigation and Classification via Open-Source Large Language Model Processing

投稿日: 2025年4月1日作成者: jarxiv

要約

このペーパーでは、オープンソースの大手言語モデル（LLM）を使用したテキストデータから、フレーム、物語、トピックなどの潜在的な構造を検出および分類するための新しい方法論的フレームワークを紹介します。
提案されたハイブリッドアプローチは、自動化された要約と人間のループ検証を組み合わせて、構成識別の精度と解釈可能性を高めます。
専門家の洗練と組み合わせた反復サンプリングを採用することにより、フレームワークは方法論的堅牢性を保証し、概念的な精度を保証します。
AIポリシー討論、暗号化に関する新聞記事、20のNewsGroupsデータセットなど、多様なデータセットに適用されるこのアプローチは、複雑な政治的言説、メディアフレーミング、トピック分類タスクを体系的に分析する際の汎用性を示しています。

要約(オリジナル)

This paper presents a novel methodological framework for detecting and classifying latent constructs, including frames, narratives, and topics, from textual data using Open-Source Large Language Models (LLMs). The proposed hybrid approach combines automated summarization with human-in-the-loop validation to enhance the accuracy and interpretability of construct identification. By employing iterative sampling coupled with expert refinement, the framework guarantees methodological robustness and ensures conceptual precision. Applied to diverse data sets, including AI policy debates, newspaper articles on encryption, and the 20 Newsgroups data set, this approach demonstrates its versatility in systematically analyzing complex political discourses, media framing, and topic classification tasks.

arxiv情報

著者	Maël Kubli
発行日	2025-03-31 14:37:40+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL, cs.LG, I.2.7 | コメントを受け付けていません

ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery

投稿日: 2025年4月1日作成者: jarxiv

要約

大規模な言語モデル（LLMS）の進歩は、LLMベースの言語エージェントの開発に関心が高まって、科学的発見のエンドツーエンドを自動化し、彼らの真の能力についての興奮と懐疑論の両方を引き起こしました。
この作業では、エンドツーエンドの自動化について大胆な主張をする前に、科学的ワークフローの個々のタスクに関するエージェントの厳密な評価を求めます。
この目的のために、データ駆動型の科学的発見のための言語エージェントを評価するための新しいベンチマークであるScienceagentbenchを提示します。
ベンチマークの科学的信頼性と現実世界の関連性を確保するために、4つの分野で44のピアレビューされた出版物から102のタスクを抽出し、9つの主題の専門家にそれらを検証するために抽出します。
すべてのタスクのターゲット出力を自己完結型のPythonプログラムファイルに統合し、生成されたプログラム、実行結果、およびコストを調べるために一連の評価メトリックを使用します。
各タスクは、注釈の質と科学的妥当性を確保するために、アノテーターと主題の専門家による複数のラウンドの手動検証を通過します。
また、データ汚染の懸念を軽減するための2つの効果的な戦略を提案します。
ScienceAnterbenchを使用して、5つのオープンウェイトと独自のLLMを評価し、それぞれ3つのフレームワークを備えています。
各タスクに対して3回の試行を考えると、最高のパフォーマンスエージェントは、タスクの32.4％しか独立して、34.3％が専門家が提供する知識で解決できます。
さらに、OpenAI O1-Previewを直接プロンプトと自己障害で評価します。これにより、パフォーマンスが42.2％に向上し、推論時間計算の増加の有効性が実証されますが、他のLLMのコストの10倍以上のコストがあります。
それでも、我々の結果は、科学研究のためのエンドツーエンドの自動化は言うまでもなく、データ駆動型の発見のためのコードを生成する際の現在の言語エージェントの制限を強調しています。

要約(オリジナル)

The advancements of large language models (LLMs) have piqued growing interest in developing LLM-based language agents to automate scientific discovery end-to-end, which has sparked both excitement and skepticism about their true capabilities. In this work, we call for rigorous assessment of agents on individual tasks in a scientific workflow before making bold claims on end-to-end automation. To this end, we present ScienceAgentBench, a new benchmark for evaluating language agents for data-driven scientific discovery. To ensure the scientific authenticity and real-world relevance of our benchmark, we extract 102 tasks from 44 peer-reviewed publications in four disciplines and engage nine subject matter experts to validate them. We unify the target output for every task to a self-contained Python program file and employ an array of evaluation metrics to examine the generated programs, execution results, and costs. Each task goes through multiple rounds of manual validation by annotators and subject matter experts to ensure its annotation quality and scientific plausibility. We also propose two effective strategies to mitigate data contamination concerns. Using ScienceAgentBench, we evaluate five open-weight and proprietary LLMs, each with three frameworks: direct prompting, OpenHands CodeAct, and self-debug. Given three attempts for each task, the best-performing agent can only solve 32.4% of the tasks independently and 34.3% with expert-provided knowledge. In addition, we evaluate OpenAI o1-preview with direct prompting and self-debug, which can boost the performance to 42.2%, demonstrating the effectiveness of increasing inference-time compute but with more than 10 times the cost of other LLMs. Still, our results underscore the limitations of current language agents in generating code for data-driven discovery, let alone end-to-end automation for scientific research.

arxiv情報

著者	Ziru Chen,Shijie Chen,Yuting Ning,Qianheng Zhang,Boshi Wang,Botao Yu,Yifei Li,Zeyi Liao,Chen Wei,Zitong Lu,Vishal Dey,Mingyi Xue,Frazier N. Baker,Benjamin Burns,Daniel Adu-Ampratwum,Xuhui Huang,Xia Ning,Song Gao,Yu Su,Huan Sun
発行日	2025-03-31 14:39:44+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL, cs.LG | コメントを受け付けていません

Predicting Targeted Therapy Resistance in Non-Small Cell Lung Cancer Using Multimodal Machine Learning

投稿日: 2025年4月1日作成者: jarxiv

要約

肺がんは世界中の癌死の主な原因であり、非小細胞肺癌（NSCLC）が最も一般的なサブタイプとして浮上しています。
NSCLC患者の中で、約32.3％が表皮成長因子受容体（EGFR）遺伝子に変異を持っています。
第3世代のEGFR-チロシンキナーゼ阻害剤（TKI）であるオシメルチニブは、活性化およびT790M耐性EGFR変異を伴うNSCLC患者の治療において顕著な有効性を示しています。
その確立された有効性にもかかわらず、薬物耐性は患者がオシメルチニブから完全に利益を得るために大きな課題をもたらします。
Osimertinibの耐性を含むTKI耐性を正確に予測するための標準ツールがないことは、依然として重要な障害のままです。
このギャップを埋めるために、この研究では、EGFR変異を活性化する後期NSCLC患者の患者抵抗性を予測するために設計された解釈可能なマルチモーダル機械学習モデルを開発し、マルチ施設データセットで0.82のCインデックスを達成しました。
この機械学習モデルは、患者の訪問と医学的評価中に日常的に収集された容易に利用可能なデータを活用して、精密肺がん管理と情報に基づいた治療の決定を促進します。
組織学画像、次世代シーケンス（NGS）データ、人口統計データ、臨床記録などのさまざまなデータ型を統合することにより、マルチモーダルモデルは十分な情報に基づいた推奨事項を生成できます。
また、実験結果は、単一のモダリティモデル（0.75および0.77と比較してC-Index 0.82）よりもマルチモーダルモデルの優れた性能を示し、患者の転帰予測に複数のモダリティを組み合わせるという利点を強調しました。

要約(オリジナル)

Lung cancer is the primary cause of cancer death globally, with non-small cell lung cancer (NSCLC) emerging as its most prevalent subtype. Among NSCLC patients, approximately 32.3% have mutations in the epidermal growth factor receptor (EGFR) gene. Osimertinib, a third-generation EGFR-tyrosine kinase inhibitor (TKI), has demonstrated remarkable efficacy in the treatment of NSCLC patients with activating and T790M resistance EGFR mutations. Despite its established efficacy, drug resistance poses a significant challenge for patients to fully benefit from osimertinib. The absence of a standard tool to accurately predict TKI resistance, including that of osimertinib, remains a critical obstacle. To bridge this gap, in this study, we developed an interpretable multimodal machine learning model designed to predict patient resistance to osimertinib among late-stage NSCLC patients with activating EGFR mutations, achieving a c-index of 0.82 on a multi-institutional dataset. This machine learning model harnesses readily available data routinely collected during patient visits and medical assessments to facilitate precision lung cancer management and informed treatment decisions. By integrating various data types such as histology images, next generation sequencing (NGS) data, demographics data, and clinical records, our multimodal model can generate well-informed recommendations. Our experiment results also demonstrated the superior performance of the multimodal model over single modality models (c-index 0.82 compared with 0.75 and 0.77), thus underscoring the benefit of combining multiple modalities in patient outcome prediction.

arxiv情報

著者	Peiying Hua,Andrea Olofson,Faraz Farhadi,Liesbeth Hondelink,Gregory Tsongalis,Konstantin Dragnev,Dagmar Hoegemann Savellano,Arief Suriawinata,Laura Tafe,Saeed Hassanpour
発行日	2025-03-31 14:47:02+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.LG | コメントを受け付けていません

Studying the Interplay Between the Actor and Critic Representations in Reinforcement Learning

投稿日: 2025年4月1日作成者: jarxiv

要約

高次元観測の流れから関連情報を抽出することは、深い強化学習エージェントにとって中心的な課題です。
俳優の批判的なアルゴリズムは、同じ情報が俳優と批評家の両方に関連するかどうかはしばしば不明であるため、この課題にさらに複雑さを追加します。
この目的のために、私たちはここで、俳優とポリシー上のアルゴリズムの批評家の効果的な表現の根底にある原則を探ります。
私たちは、俳優と批評家が共有された表現ではなく、別々の恩恵を受けるかどうかを理解することに焦点を当てています。
私たちの主な発見は、分離すると、俳優と批評家の表現が環境からさまざまな種類の情報を抽出することを体系的に専門とすることです。俳優の表現はアクション関連情報に焦点を当てる傾向があり、批評家の表現は価値とダイナミクス情報のエンコードに特化しています。
サンプルの効率と生成能力の観点から、さまざまな表現学習アプローチが俳優と批評家の専門化と下流のパフォーマンスにどのように影響するかを理解するために、厳しい経験的研究を実施します。
最後に、私たちは、分離された批評家が、トレーニング中の探査とデータ収集において重要な役割を果たしていることを発見します。
私たちのコード、トレーニングされたモデル、およびデータは、https：//github.com/francelico/deac-repでアクセスできます。

要約(オリジナル)

Extracting relevant information from a stream of high-dimensional observations is a central challenge for deep reinforcement learning agents. Actor-critic algorithms add further complexity to this challenge, as it is often unclear whether the same information will be relevant to both the actor and the critic. To this end, we here explore the principles that underlie effective representations for the actor and for the critic in on-policy algorithms. We focus our study on understanding whether the actor and critic will benefit from separate, rather than shared, representations. Our primary finding is that when separated, the representations for the actor and critic systematically specialise in extracting different types of information from the environment — the actor’s representation tends to focus on action-relevant information, while the critic’s representation specialises in encoding value and dynamics information. We conduct a rigourous empirical study to understand how different representation learning approaches affect the actor and critic’s specialisations and their downstream performance, in terms of sample efficiency and generation capabilities. Finally, we discover that a separated critic plays an important role in exploration and data collection during training. Our code, trained models and data are accessible at https://github.com/francelico/deac-rep.

arxiv情報

著者	Samuel Garcin,Trevor McInroe,Pablo Samuel Castro,Prakash Panangaden,Christopher G. Lucas,David Abel,Stefano V. Albrecht
発行日	2025-03-31 14:56:08+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.LG | コメントを受け付けていません

Output Constraints as Attack Surface: Exploiting Structured Generation to Bypass LLM Safety Mechanisms

投稿日: 2025年4月1日作成者: jarxiv

要約

コンテンツの警告：このホワイトペーパーには、読者にとって不快なLLMSによって生成される安全でないまたは有害なコンテンツが含まれている場合があります。
大規模な言語モデル（LLMS）は、構造化された出力APIを介したツールプラットフォームとして広く使用されており、エージェントシステムなどの既存のソフトウェアとの堅牢な統合が達成できるように、構文コンプライアンスを確保します。
ただし、文法誘導構造出力の機能を有効にする機能は、重要なセキュリティの脆弱性を示します。
この作業では、従来のデータプレーンの脆弱性に対するクリティカルコントロールプレーン攻撃面表面の直交を明らかにします。
制約されたデコード攻撃（CDA）を導入します。これは、構造化された出力制約を安全メカニズムに武器化する新しい脱獄クラスです。
入力プロンプトに焦点を当てた以前の攻撃とは異なり、CDAは、良性の表面プロンプト（データプレーン）を維持しながら、スキーマレベルの文法ルール（コントロールプレーン）に悪意を埋めることにより動作します。
これは、概念の証明チェーンエインム攻撃でインスタンス化され、GPT-4OやGemini-2.0-Flashを含む1つのクエリを備えた5つの安全ベンチマークで、独自およびオープンウェイトLLMで96.2％の攻撃成功率を達成します。
私たちの調査結果は、現在のLLMアーキテクチャにおける重要なセキュリティ死角を特定し、データプレーンの脅威のみに焦点を当てた現在のメカニズムが重要なシステムを露出させるため、制御面の脆弱性に対処するためにLLMの安全性のパラダイムシフトを促します。

要約(オリジナル)

Content Warning: This paper may contain unsafe or harmful content generated by LLMs that may be offensive to readers. Large Language Models (LLMs) are extensively used as tooling platforms through structured output APIs to ensure syntax compliance so that robust integration with existing softwares like agent systems, could be achieved. However, the feature enabling functionality of grammar-guided structured output presents significant security vulnerabilities. In this work, we reveal a critical control-plane attack surface orthogonal to traditional data-plane vulnerabilities. We introduce Constrained Decoding Attack (CDA), a novel jailbreak class that weaponizes structured output constraints to bypass safety mechanisms. Unlike prior attacks focused on input prompts, CDA operates by embedding malicious intent in schema-level grammar rules (control-plane) while maintaining benign surface prompts (data-plane). We instantiate this with a proof-of-concept Chain Enum Attack, achieves 96.2% attack success rates across proprietary and open-weight LLMs on five safety benchmarks with a single query, including GPT-4o and Gemini-2.0-flash. Our findings identify a critical security blind spot in current LLM architectures and urge a paradigm shift in LLM safety to address control-plane vulnerabilities, as current mechanisms focused solely on data-plane threats leave critical systems exposed.

arxiv情報

著者	Shuoming Zhang,Jiacheng Zhao,Ruiyuan Xu,Xiaobing Feng,Huimin Cui
発行日	2025-03-31 15:08:06+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CR | コメントを受け付けていません

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント