jarxiv | Japanese arxiv

Prefix-Tuning+: Modernizing Prefix-Tuning through Attention Independent Prefix Data

投稿日: 2025年6月17日作成者: jarxiv

要約

パラメーター効率の高い微調整（PEFT）メソッドは、大規模な言語モデル（LLM）をダウンストリームタスクに迅速に適応させるために重要になっています。
初期の効果的なPEFT技術であるプレフィックスチューニングは、計算およびメモリのオーバーヘッドが大幅に減少し、完全な微調整に匹敵するパフォーマンスを達成する能力を実証しました。
しかし、以前の成功にもかかわらず、最新の最先端のLLMSのトレーニングにおけるその有効性は非常に限られています。
この作業では、注意ヘッド内の入力と接頭辞の有意性との固有のトレードオフのために、プレフィックスチューニングがLLMのパフォーマンスを低下させることを経験的に示します。
これにより、Prefix-Tuning+を導入するようになります。プレフィックスチューニングの原理を一般化しながら、Attention Head自体からプレフィックスモジュールをシフトすることで欠点に対処する新しいアーキテクチャです。
さらに、独自のコンテキストベースの方法を構築する際に将来のユーザーをガイドするための建設プロセスの概要を説明します。
私たちの実験は、さまざまなベンチマークのセットで、プレフィックスチューニング+が既存のプレフィックスチューニングメソッドを常に上回ることを示しています。
特に、いくつかの一般的なベンチマークで広く採用されているLORAメソッドと同等のパフォーマンスを実現し、プレフィックスチューニングアプローチの潜在的な最新の拡張を強調しています。
私たちの調査結果は、その固有の制限を克服することにより、プレフィックス調整がパラメーター効率の高いLLM適応の状況における競争的で関連する研究の方向性を維持できることを示唆しています。

要約(オリジナル)

Parameter-Efficient Fine-Tuning (PEFT) methods have become crucial for rapidly adapting large language models (LLMs) to downstream tasks. Prefix-Tuning, an early and effective PEFT technique, demonstrated the ability to achieve performance comparable to full fine-tuning with significantly reduced computational and memory overhead. However, despite its earlier success, its effectiveness in training modern state-of-the-art LLMs has been very limited. In this work, we demonstrate empirically that Prefix-Tuning underperforms on LLMs because of an inherent tradeoff between input and prefix significance within the attention head. This motivates us to introduce Prefix-Tuning+, a novel architecture that generalizes the principles of Prefix-Tuning while addressing its shortcomings by shifting the prefix module out of the attention head itself. We further provide an overview of our construction process to guide future users when constructing their own context-based methods. Our experiments show that, across a diverse set of benchmarks, Prefix-Tuning+ consistently outperforms existing Prefix-Tuning methods. Notably, it achieves performance on par with the widely adopted LoRA method on several general benchmarks, highlighting the potential modern extension of Prefix-Tuning approaches. Our findings suggest that by overcoming its inherent limitations, Prefix-Tuning can remain a competitive and relevant research direction in the landscape of parameter-efficient LLM adaptation.

arxiv情報

著者	Haonan Wang,Brian Chen,Li Siquan,Liang Xinhe,Tianyang Hu,Hwee Kuan Lee,Kenji Kawaguchi
発行日	2025-06-16 16:30:26+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL | コメントを受け付けていません

Meta-learning how to Share Credit among Macro-Actions

投稿日: 2025年6月17日作成者: jarxiv

要約

強化学習の探査を改善するための提案されているメカニズムの1つは、マクロアクションの使用によるものです。
しかし、逆説的に、多くのシナリオでは、マクロアクションの素朴な追加は、より良い探索につながるのではなく、逆になります。
これは、有用でないマクロを追加することによって引き起こされ、複数の作品が効果的に環境固有の有用なマクロを発見するメカニズムに焦点を合わせていると主張されています。
この作業では、わずかに異なる視点を取ります。
困難は、エピソードあたりの平均決定数を減らすことと、アクション空間のサイズを増やすことの間のトレードオフに起因すると主張します。
つまり、通常、各潜在的なマクロアクションを独立した原子として扱うため、検索空間を厳密に増加させ、典型的な探査戦略を非効率的にします。
この問題に対処するために、アクション空間の効果的な次元を減らして探索を改善することにより、アクションとマクロアクションの関係を悪用してクレジット割り当てメカニズムを改善する新しい正規化用語を提案します。
この用語は、目的のポリシーの学習と共同でメタを授与される類似性マトリックスに依存しています。
Atari GamesとStreetFighter II環境のマクロアクションを検討する戦略を経験的に検証します。
私たちの結果は、すべての環境での虹-DQNベースラインよりも大幅な改善を示しています。
さらに、マクロアクションの類似性が関連環境に転送可能であることを示します。
この作業は、アクション空間で類似性が課せられたジオメトリを悪用してクレジットの割り当てと探索を改善する方法を理解するための小さなが重要なステップであるため、学習をより効果的にすると考えています。

要約(オリジナル)

One proposed mechanism to improve exploration in reinforcement learning is through the use of macro-actions. Paradoxically though, in many scenarios the naive addition of macro-actions does not lead to better exploration, but rather the opposite. It has been argued that this was caused by adding non-useful macros and multiple works have focused on mechanisms to discover effectively environment-specific useful macros. In this work, we take a slightly different perspective. We argue that the difficulty stems from the trade-offs between reducing the average number of decisions per episode versus increasing the size of the action space. Namely, one typically treats each potential macro-action as independent and atomic, hence strictly increasing the search space and making typical exploration strategies inefficient. To address this problem we propose a novel regularization term that exploits the relationship between actions and macro-actions to improve the credit assignment mechanism by reducing the effective dimension of the action space and, therefore, improving exploration. The term relies on a similarity matrix that is meta-learned jointly with learning the desired policy. We empirically validate our strategy looking at macro-actions in Atari games, and the StreetFighter II environment. Our results show significant improvements over the Rainbow-DQN baseline in all environments. Additionally, we show that the macro-action similarity is transferable to related environments. We believe this work is a small but important step towards understanding how the similarity-imposed geometry on the action space can be exploited to improve credit assignment and exploration, therefore making learning more effective.

arxiv情報

著者	Ionel-Alexandru Hosu,Traian Rebedea,Razvan Pascanu
発行日	2025-06-16 16:52:49+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.LG | コメントを受け付けていません

Balancing Knowledge Delivery and Emotional Comfort in Healthcare Conversational Systems

投稿日: 2025年6月17日作成者: jarxiv

要約

大規模な言語モデルの進歩により、多くのダイアログシステムは現在、患者の病状に合理的かつ有益な反応を提供できるようになりました。
しかし、患者が医師に相談すると、状況の重症度と緊急性のために否定的な感情を経験する可能性があります。
モデルが医学的質問に答える間、患者の否定的な感情に基づいて適切な快適さと共感を提供できる場合、医療相談プロセス中により心強い経験を提供する可能性があります。
この問題に対処するために、私たちの論文では、ヘルスケア対話プロセスにおける知識共有と感情的なサポートのバランスを探ります。
大規模な言語モデルを利用して、実世界のインタラクティブな医療対話データセットを書き直し、患者の感情を癒すことを目的とした、ネガティブな感情と対応する医学的反応で患者の質問を生成します。
修正されたデータは、さまざまな微調整方法を備えた最新の大手言語モデルを改良するのに役立ち、患者の質問に応じて、感情的な安心と建設的な提案の両方を文章に正確に提供できるようになります。
元のLLMモデルと比較して、実験結果は、私たちの方法論が、正確な知識ベースの回答を提供するために元の能力を維持しながら、感情的な反応を生成するモデルの能力を大幅に向上させることを示しています。

要約(オリジナル)

With the advancement of large language models, many dialogue systems are now capable of providing reasonable and informative responses to patients’ medical conditions. However, when patients consult their doctor, they may experience negative emotions due to the severity and urgency of their situation. If the model can provide appropriate comfort and empathy based on the patient’s negative emotions while answering medical questions, it will likely offer a more reassuring experience during the medical consultation process. To address this issue, our paper explores the balance between knowledge sharing and emotional support in the healthcare dialogue process. We utilize a large language model to rewrite a real-world interactive medical dialogue dataset, generating patient queries with negative emotions and corresponding medical responses aimed at soothing the patient’s emotions while addressing their concerns. The modified data serves to refine the latest large language models with various fine-tuning methods, enabling them to accurately provide sentences with both emotional reassurance and constructive suggestions in response to patients’ questions. Compared to the original LLM model, our experimental results demonstrate that our methodology significantly enhances the model’s ability to generate emotional responses while maintaining its original capability to provide accurate knowledge-based answers.

arxiv情報

著者	Shang-Chi Tsai,Yun-Nung Chen
発行日	2025-06-16 16:54:03+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL | コメントを受け付けていません

FoMoH: A clinically meaningful foundation model evaluation for structured electronic health records

投稿日: 2025年6月17日作成者: jarxiv

要約

財団モデルは、ダウンストリームタスクとは無関係に意味のある表現を抽出する能力を考えると、ヘルスケアに大きな約束を保持しています。
このプロパティは、ヘルスケアの一般的な課題である限られたラベルデータの設定であっても、構造化された電子健康記録（EHR）データでトレーニングされたいくつかの臨床アプリケーションで最先端のパフォーマンスを可能にしました。
ただし、包括的で意味のあるタスクのデシデラタが不足しているため、これらのモデルの臨床的有用性の可能性についてはほとんどコンセンサスがあり、従来の監視学習に対する利益を特徴付けるための十分に多様な評価があります。
このギャップに対処するために、患者の転帰にまたがる一連の臨床的に意味のあるタスクを提案します。これは、堅牢な評価のためのDesiderataを含む急性および慢性疾患の早期予測です。
ニューヨーク市の大規模な都市アカデミックメディカルセンターであるコロンビア大学アーヴィングメディカルセンター（CUMC）の500万人の患者で構成されるEHRデータに関する最先端の財団モデルを評価します。
トレーニング前、トークン化、およびデータ表現戦略の選択に基づいて、全体的な精度、キャリブレーション、および亜集団パフォーマンスを表面トレードオフに測定します。
私たちの研究の目的は、構造化されたEHR基礎モデルの経験的評価を進め、将来のヘルスケア財団モデルの開発を導くことです。

要約(オリジナル)

Foundation models hold significant promise in healthcare, given their capacity to extract meaningful representations independent of downstream tasks. This property has enabled state-of-the-art performance across several clinical applications trained on structured electronic health record (EHR) data, even in settings with limited labeled data, a prevalent challenge in healthcare. However, there is little consensus on these models’ potential for clinical utility due to the lack of desiderata of comprehensive and meaningful tasks and sufficiently diverse evaluations to characterize the benefit over conventional supervised learning. To address this gap, we propose a suite of clinically meaningful tasks spanning patient outcomes, early prediction of acute and chronic conditions, including desiderata for robust evaluations. We evaluate state-of-the-art foundation models on EHR data consisting of 5 million patients from Columbia University Irving Medical Center (CUMC), a large urban academic medical center in New York City, across 14 clinically relevant tasks. We measure overall accuracy, calibration, and subpopulation performance to surface tradeoffs based on the choice of pre-training, tokenization, and data representation strategies. Our study aims to advance the empirical evaluation of structured EHR foundation models and guide the development of future healthcare foundation models.

arxiv情報

著者	Chao Pang,Vincent Jeanselme,Young Sang Choi,Xinzhuo Jiang,Zilin Jing,Aparajita Kashyap,Yuta Kobayashi,Yanwei Li,Florent Pollet,Karthik Natarajan,Shalmali Joshi
発行日	2025-06-16 17:03:07+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.LG | コメントを受け付けていません

Distinguishing Autonomous AI Agents from Collaborative Agentic Systems: A Comprehensive Framework for Understanding Modern Intelligent Architectures

投稿日: 2025年6月17日作成者: jarxiv

要約

大規模な言語モデルの出現により、人工知能における2つの明確で相互接続されたパラダイムが触媒されました：スタンドアロンAIエージェントと共同エージェントAIエコシステム。
この包括的な研究は、運用原理、構造構成、展開方法論の体系的な分析を通じて、これらのアーキテクチャを区別するための決定的な枠組みを確立します。
AIエージェントは、制約された環境内でターゲットを絞った自動化の基礎モデルを活用する特殊なツール強化システムとして特徴づけています。
逆に、エージェントAIは、分散エージェントが調整された相互作用プロトコルを通じて緊急の集合知能を示す洗練された多entityフレームワークを表します。
私たちの調査は、伝統的なルールベースのシステムから生成的AIの基礎を通じて現代のエージェントアーキテクチャまで進化的な軌跡をたどります。
計画メカニズム、メモリシステム、調整プロトコル、および意思決定プロセスを調べる詳細なアーキテクチャ比較を提示します。
この調査では、アプリケーションのランドスケープを分類し、カスタマーサービスとコンテンツ管理におけるシングルエージェントの実装と、研究自動化および複雑な意思決定支援におけるマルチエージェントの展開と対照的です。
強化された推論フレームワーク、堅牢なメモリアーキテクチャ、および改善された調整メカニズムを通じて革新的なソリューションを提案しながら、信頼性の問題、調整の複雑さ、スケーラビリティの制約などの重要な課題を特定します。
このフレームワークは、適切なエージェントアプローチを選択する実務家に重要なガイダンスを提供し、次世代のインテリジェントなシステム開発の基礎原則を確立します。

要約(オリジナル)

The emergence of large language models has catalyzed two distinct yet interconnected paradigms in artificial intelligence: standalone AI Agents and collaborative Agentic AI ecosystems. This comprehensive study establishes a definitive framework for distinguishing these architectures through systematic analysis of their operational principles, structural compositions, and deployment methodologies. We characterize AI Agents as specialized, tool-enhanced systems leveraging foundation models for targeted automation within constrained environments. Conversely, Agentic AI represents sophisticated multi-entity frameworks where distributed agents exhibit emergent collective intelligence through coordinated interaction protocols. Our investigation traces the evolutionary trajectory from traditional rule-based systems through generative AI foundations to contemporary agent architectures. We present detailed architectural comparisons examining planning mechanisms, memory systems, coordination protocols, and decision-making processes. The study categorizes application landscapes, contrasting single-agent implementations in customer service and content management with multi-agent deployments in research automation and complex decision support. We identify critical challenges including reliability issues, coordination complexities, and scalability constraints, while proposing innovative solutions through enhanced reasoning frameworks, robust memory architectures, and improved coordination mechanisms. This framework provides essential guidance for practitioners selecting appropriate agentic approaches and establishes foundational principles for next-generation intelligent system development.

arxiv情報

著者	Prashik Buddhaghosh Bansod
発行日	2025-06-16 17:03:41+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI | コメントを受け付けていません

Value-Free Policy Optimization via Reward Partitioning

投稿日: 2025年6月17日作成者: jarxiv

要約

単一の操作補強学習（RL）メソッドは、スカラーリワードが直接利用可能な（プロンプト、応答、報酬）トリプレットで構成されるデータセットからポリシーを最適化することを目的としています。
この監督形式は、親指/ダウンシグナルなどの実際の人間のフィードバックを反映しており、構造化された優先注釈の必要性を回避するため、非常に実用的です。
対照的に、直接選好最適化（DPO）などのペアワイズ優先ベースの方法は、優先応答と分散した応答の両方を持つデータセットに依存しています。
単一の軌跡のアプローチの中で、直接報酬最適化（DRO）は、その単純さと安定性のために強力な経験的パフォーマンスを示しています。
ただし、DROでは、値関数を近似する必要があります。これには、いくつかの制限が導入されます。ポリシー学習と価値学習の間の結合、ポリシー自体の絶対的な監督の欠如です。
値関数をモデル化する必要性を削除することによりこれらの制限を解決する新しい方法である、報酬パーティションオプティション（RPO）を導入します。
代わりに、RPOは、データから直接推定されたパーティションアプローチを使用して、観測された報酬を正常化します。
これにより、補助モデルがなく、共同最適化がないため、ポリシーに関する簡単な監視された学習目標につながります。
RPOは、ポリシーに関する直接的で安定した監督を提供し、実際に堅牢で実装しやすくします。
FLAN-T5エンコーダデコーダーモデルを使用して、Scalar-Feedback言語モデリングタスクのRPOを検証します。
我々の結果は、RPOがDROやKahneman-Tversky Optimization（KTO）などの既存の単一軌道ベースラインよりも優れていることを示しています。
これらの調査結果は、RPOが単一の訓練ポリシーの最適化のためのシンプルで効果的で理論的に根拠のある方法であることを確認しています。

要約(オリジナル)

Single-trajectory reinforcement learning (RL) methods aim to optimize policies from datasets consisting of (prompt, response, reward) triplets, where scalar rewards are directly available. This supervision format is highly practical, as it mirrors real-world human feedback, such as thumbs-up/down signals, and avoids the need for structured preference annotations. In contrast, pairwise preference-based methods like Direct Preference Optimization (DPO) rely on datasets with both preferred and dispreferred responses, which are harder to construct and less natural to collect. Among single-trajectory approaches, Direct Reward Optimization (DRO) has shown strong empirical performance due to its simplicity and stability. However, DRO requires approximating a value function, which introduces several limitations: high off-policy variance, coupling between policy and value learning, and a lack of absolute supervision on the policy itself. We introduce Reward Partitioning Optimization (RPO), a new method that resolves these limitations by removing the need to model the value function. Instead, RPO normalizes observed rewards using a partitioning approach estimated directly from data. This leads to a straightforward supervised learning objective on the policy, with no auxiliary models and no joint optimization. RPO provides direct and stable supervision on the policy, making it robust and easy to implement in practice. We validate RPO on scalar-feedback language modeling tasks using Flan-T5 encoder-decoder models. Our results demonstrate that RPO outperforms existing single-trajectory baselines such as DRO and Kahneman-Tversky Optimization (KTO). These findings confirm that RPO is a simple, effective, and theoretically grounded method for single-trajectory policy optimization.

arxiv情報

著者	Bilal Faye,Hanane Azzag,Mustapha Lebbah
発行日	2025-06-16 17:06:27+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.LG | コメントを受け付けていません

TimeMaster: Training Time-Series Multimodal LLMs to Reason via Reinforcement Learning

投稿日: 2025年6月17日作成者: jarxiv

要約

時系列の推論は、動的な時間的パターン、曖昧なセマンティクス、および時間的前層の欠如のため、マルチモーダルの大手言語モデル（MLLM）において重要な課題のままです。
この作業では、タイムシリーズMLLMが視覚化された時系列の入力とタスクプロンプトに対して直接的に解釈可能な推論を直接実行できるようにするための、強化学習（RL）ベースの方法であるTimemasterを紹介します。
Timemasterは、3部構成の構造化された出力形式、推論、分類、およびドメイン固有の拡張機能を採用し、アドヒアランス、予測精度、およびオープンエンドの洞察品質を調整する複合報酬関数を介して最適化されます。
このモデルは、2段階のパイプラインを使用してトレーニングされています。まず、監視された微調整（SFT）を適用して適切な初期化を確立し、その後、トークンレベルでグループ相対ポリシー最適化（GRPO）が続き、時間系列の推論における安定したターゲットを絞った報酬駆動型の改善を可能にします。
QWEN2.5-VL-3B-Instructに基づいた6つの実際の分類タスクにわたって、タイマーベンチマークのタイムマスターを評価します。
Timemasterは、最先端のパフォーマンスを達成し、クラシックタイムシリーズモデルと少数のGPT-4Oの両方をそれぞれ14.6％以上および7.3％以上のパフォーマンスゲインよりも優れています。
特に、Timemasterは時系列の分類を超えています。また、専門家のような推論行動を示し、コンテキストを意識した説明を生成し、ドメインに並べられた洞察を提供します。
私たちの結果は、報酬駆動型のRLが、時間シリーズMLLMに一時的な理解を統合するためのスケーラブルで有望なパスになる可能性があることを強調しています。

要約(オリジナル)

Time-series reasoning remains a significant challenge in multimodal large language models (MLLMs) due to the dynamic temporal patterns, ambiguous semantics, and lack of temporal priors. In this work, we introduce TimeMaster, a reinforcement learning (RL)-based method that enables time-series MLLMs to perform structured, interpretable reasoning directly over visualized time-series inputs and task prompts. TimeMaster adopts a three-part structured output format, reasoning, classification, and domain-specific extension, and is optimized via a composite reward function that aligns format adherence, prediction accuracy, and open-ended insight quality. The model is trained using a two-stage pipeline: we first apply supervised fine-tuning (SFT) to establish a good initialization, followed by Group Relative Policy Optimization (GRPO) at the token level to enable stable and targeted reward-driven improvement in time-series reasoning. We evaluate TimeMaster on the TimerBed benchmark across six real-world classification tasks based on Qwen2.5-VL-3B-Instruct. TimeMaster achieves state-of-the-art performance, outperforming both classical time-series models and few-shot GPT-4o by over 14.6% and 7.3% performance gain, respectively. Notably, TimeMaster goes beyond time-series classification: it also exhibits expert-like reasoning behavior, generates context-aware explanations, and delivers domain-aligned insights. Our results highlight that reward-driven RL can be a scalable and promising path toward integrating temporal understanding into time-series MLLMs.

arxiv情報

著者	Junru Zhang,Lang Feng,Xu Guo,Yuhan Wu,Yabo Dong,Duanqing Xu
発行日	2025-06-16 17:12:26+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.LG | コメントを受け付けていません

Contrastive Self-Supervised Learning As Neural Manifold Packing

投稿日: 2025年6月17日作成者: jarxiv

要約

ポイントごとの比較に基づいた対照的な自己監視学習は、ビジョンタスクのために広く研究されています。
脳の視覚皮質では、明確な刺激クラスに対するニューロン応答が、神経マニホールドとして知られる幾何学的構造に編成されています。
刺激の正確な分類は、梱包の問題を解決するのと同様に、これらの多様体を効果的に分離することで実現できます。
対照学習をマニホールドパッキング（CLAMP）として紹介します。これは、表現学習をマニホールドパッキングの問題として再構築する自己補助的なフレームワークです。
クランプは、単純な液体や詰まったパッキングの物理学で遭遇したものなど、短距離反発粒子システムのポテンシャルエネルギーに触発された損失関数を導入します。
このフレームワークでは、各クラスは、単一の画像の複数の拡張ビューを埋め込むサブマニホールドで構成されています。
サブマニホールドのサイズと位置は、梱包損失の勾配に従って動的に最適化されます。
このアプローチは、ジャミング物理学を平行にする埋め込み空間に解釈可能なダイナミクスをもたらし、損失関数内に幾何学的に意味のあるハイパーパラメーターを導入します。
バックボーンをフリーズし、線形分類器のみをトレーニングする標準の線形評価プロトコルの下で、クランプは最先端の自己監視モデルで競争力のあるパフォーマンスを達成します。
さらに、私たちの分析では、異なるカテゴリに対応するニューラルマニホールドが自然に現れ、学習した表現空間で効果的に分離されており、物理学、神経科学、機械学習からの洞察を橋渡しするためのクランプの可能性を強調しています。

要約(オリジナル)

Contrastive self-supervised learning based on point-wise comparisons has been widely studied for vision tasks. In the visual cortex of the brain, neuronal responses to distinct stimulus classes are organized into geometric structures known as neural manifolds. Accurate classification of stimuli can be achieved by effectively separating these manifolds, akin to solving a packing problem. We introduce Contrastive Learning As Manifold Packing (CLAMP), a self-supervised framework that recasts representation learning as a manifold packing problem. CLAMP introduces a loss function inspired by the potential energy of short-range repulsive particle systems, such as those encountered in the physics of simple liquids and jammed packings. In this framework, each class consists of sub-manifolds embedding multiple augmented views of a single image. The sizes and positions of the sub-manifolds are dynamically optimized by following the gradient of a packing loss. This approach yields interpretable dynamics in the embedding space that parallel jamming physics, and introduces geometrically meaningful hyperparameters within the loss function. Under the standard linear evaluation protocol, which freezes the backbone and trains only a linear classifier, CLAMP achieves competitive performance with state-of-the-art self-supervised models. Furthermore, our analysis reveals that neural manifolds corresponding to different categories emerge naturally and are effectively separated in the learned representation space, highlighting the potential of CLAMP to bridge insights from physics, neural science, and machine learning.

arxiv情報

著者	Guanming Zhang,David J. Heeger,Stefano Martiniani
発行日	2025-06-16 17:24:31+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.LG, q-bio.NC, stat.ML | コメントを受け付けていません

Weakest Link in the Chain: Security Vulnerabilities in Advanced Reasoning Models

投稿日: 2025年6月17日作成者: jarxiv

要約

高度な推論機能の導入により、特に数学とコーディングベンチマークでの大規模な言語モデルの問題解決パフォーマンスが向上しました。
ただし、これらの推論モデルが、非合理的なカウンターパートよりも敵対的な迅速な攻撃に対して多かれ少なかれ脆弱であるかどうかは不明のままです。
この作業では、高度な推論モデルの弱点の体系的な評価を提示します。
実験データを使用して、平均して推論モデルは、非合理モデル（42.51 \％vs 45.53 \％攻撃成功率が優れている）よりも\ empond {わずかに堅牢}であることがわかります。
ただし、この全体的な傾向は、カテゴリ固有の大きな違いをマスクします。特定の攻撃タイプについては、推論モデルは実質的に\ emph {より脆弱}（たとえば、攻撃プロンプトで最大32パーセントポイント悪化します）が、他の人にとっては顕著に\ emphust}（例えば、より堅牢}（例えば、逆走行障害）が優れています。
私たちの調査結果は、言語モデルにおける高度な推論の微妙なセキュリティへの影響を強調し、多様な敵対的な技術にわたるストレステストの安全性の重要性を強調しています。

要約(オリジナル)

The introduction of advanced reasoning capabilities have improved the problem-solving performance of large language models, particularly on math and coding benchmarks. However, it remains unclear whether these reasoning models are more or less vulnerable to adversarial prompt attacks than their non-reasoning counterparts. In this work, we present a systematic evaluation of weaknesses in advanced reasoning models compared to similar non-reasoning models across a diverse set of prompt-based attack categories. Using experimental data, we find that on average the reasoning-augmented models are \emph{slightly more robust} than non-reasoning models (42.51\% vs 45.53\% attack success rate, lower is better). However, this overall trend masks significant category-specific differences: for certain attack types the reasoning models are substantially \emph{more vulnerable} (e.g., up to 32 percentage points worse on a tree-of-attacks prompt), while for others they are markedly \emph{more robust} (e.g., 29.8 points better on cross-site scripting injection). Our findings highlight the nuanced security implications of advanced reasoning in language models and emphasize the importance of stress-testing safety across diverse adversarial techniques.

arxiv情報

著者	Arjun Krishna,Aaditya Rastogi,Erick Galinkin
発行日	2025-06-16 17:32:18+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CR, cs.LG | コメントを受け付けていません

Attribution-guided Pruning for Compression, Circuit Discovery, and Targeted Correction in LLMs

投稿日: 2025年6月17日作成者: jarxiv

要約

大規模な言語モデル（LLM）は、多くの現代的なAIアプリケーションの中心ですが、その広範なパラメーターカウントは、メモリおよび計算制約の環境での展開に大きな課題をもたらします。
特に帰属方法に関する説明可能なAI（XAI）の最近の研究は、解釈可能性が推論に関係のないコンポーネントを識別および削除することにより、モデル圧縮を可能にすることも示唆しています。
このホワイトペーパーでは、層ごとの関連性伝播（LRP）を活用して、LLMの帰属誘導剪定を実行します。
LRPは視力モデルの構造化された剪定に有望を示していますが、LLMSでの構造化されていない剪定に拡張し、パフォーマンス損失を最小限に抑えてモデルサイズを大幅に削減できることを示しています。
私たちの方法は、コア関数（たとえば、間接的なオブジェクト識別）を表すことができるタスク関連のサブグラフ（いわゆる「サーキット」）を抽出するのに特に効果的です。
これに基づいて、偽の行動（たとえば、有毒な出力）の原因となる回路を選択的に除去することにより、モデル補正の手法を導入します。
全体として、私たちはこれらの手法を均一な全体的な枠組みとして収集し、LlamaおよびOPTモデルでの圧縮、回路発見、モデル補正のための広範な実験を通じてその有効性と制限を示し、モデルの効率と安全性の両方を改善する可能性を強調しています。
私たちのコードは、https://github.com/erfanhatefi/sparc3で公開されています。

要約(オリジナル)

Large Language Models (LLMs) are central to many contemporary AI applications, yet their extensive parameter counts pose significant challenges for deployment in memory- and compute-constrained environments. Recent works in eXplainable AI (XAI), particularly on attribution methods, suggest that interpretability can also enable model compression by identifying and removing components irrelevant to inference. In this paper, we leverage Layer-wise Relevance Propagation (LRP) to perform attribution-guided pruning of LLMs. While LRP has shown promise in structured pruning for vision models, we extend it to unstructured pruning in LLMs and demonstrate that it can substantially reduce model size with minimal performance loss. Our method is especially effective in extracting task-relevant subgraphs — so-called “circuits” — which can represent core functions (e.g., indirect object identification). Building on this, we introduce a technique for model correction, by selectively removing circuits responsible for spurious behaviors (e.g., toxic outputs). All in all, we gather these techniques as a uniform holistic framework and showcase its effectiveness and limitations through extensive experiments for compression, circuit discovery and model correction on Llama and OPT models, highlighting its potential for improving both model efficiency and safety. Our code is publicly available at https://github.com/erfanhatefi/SparC3.

arxiv情報

著者	Sayed Mohammad Vakilzadeh Hatefi,Maximilian Dreyer,Reduan Achtibat,Patrick Kahardipraja,Thomas Wiegand,Wojciech Samek,Sebastian Lapuschkin
発行日	2025-06-16 17:38:36+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL, cs.LG | コメントを受け付けていません

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント