jarxiv | Japanese arxiv

Intrinsic and Extrinsic Organized Attention: Softmax Invariance and Network Sparsity

投稿日: 2025年6月19日作成者: jarxiv

要約

トランスにおける自己関節メカニズムの内因性（注意ヘッド内）および外因性（注意ヘッドの中で）構造を調べます。
自己関節メカニズムのソフトマックスの活性化に対する不変性の理論的証拠は、注意ヘッドの本質的な組織に依存する麻痺性計算に訴える（そして計算例によってサポートされている）ことによって得られます。
さらに、ネットワーク3テンソルのクエリ、キー、およびヘッド軸に関して階層パーティションツリーを構築することにより、テンソルの階層構成に既存の方法論を使用して、ネットワーク構造を調べます。
このような組織は、組織化されたネットワーク3テンソルが規則性を示すジオメトリで一般的な信号処理タスクを有益に実行できるため、結果的です。
注意ヘッドと拡散マップの埋め込みで構成されたツリーの階層的な組織を視覚化することにより、これを定性的に例示し、個々の注意ヘッドの拡張係数とネットワーク全体を（それぞれ（それぞれ（キー）キー、ネットワークの空間）、およびヘッドのヘッドの空間に（それぞれ）トリハール底の拡張係数とネットワークのスパースを調査することにより定量的に例示します。
理論的および方法論的な調査結果の有用性を紹介するために、ビジョンと言語の変圧器を使用して計算例を提供します。
これらの調査結果の影響は2つあります。（1）解釈可能性分析のその後のステップが理論的に認められており、下流の解釈可能性タスクのために経験的に活用される可能性があります（2）モデルプリング（ネットワークスパースのおかげで）やネットワークアーキテクチャの比較などの経験的ネットワークアプリケーションにネットワーク3テンソル組織を使用できます。

要約(オリジナル)

We examine the intrinsic (within the attention head) and extrinsic (amongst the attention heads) structure of the self-attention mechanism in transformers. Theoretical evidence for invariance of the self-attention mechanism to softmax activation is obtained by appealing to paradifferential calculus, (and is supported by computational examples), which relies on the intrinsic organization of the attention heads. Furthermore, we use an existing methodology for hierarchical organization of tensors to examine network structure by constructing hierarchal partition trees with respect to the query, key, and head axes of network 3-tensors. Such an organization is consequential since it allows one to profitably execute common signal processing tasks on a geometry where the organized network 3-tensors exhibit regularity. We exemplify this qualitatively, by visualizing the hierarchical organization of the tree comprised of attention heads and the diffusion map embeddings, and quantitatively by investigating network sparsity with the expansion coefficients of individual attention heads and the entire network with respect to the bi and tri-haar bases (respectively) on the space of queries, keys, and heads of the network. To showcase the utility of our theoretical and methodological findings, we provide computational examples using vision and language transformers. The ramifications of these findings are two-fold: (1) a subsequent step in interpretability analysis is theoretically admitted, and can be exploited empirically for downstream interpretability tasks (2) one can use the network 3-tensor organization for empirical network applications such as model pruning (by virtue of network sparsity) and network architecture comparison.

arxiv情報

著者	Oluwadamilola Fasina,Ruben V. C. Pohle,Pei-Chun Su,Ronald R. Coifman
発行日	2025-06-18 15:14:56+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.NA, math.NA | コメントを受け付けていません

Learning Algorithms in the Limit

投稿日: 2025年6月19日作成者: jarxiv

要約

このホワイトペーパーでは、\ textIT {計算観測}および\ textIT {制限入力ソース}を組み込むために金の誘導性推論フレームワークを拡張することにより、計算可能な関数を制限で学習する問題を研究します。
従来の入出力の観察を無料で、より現実的な制約の下で一般的な再帰関数の学習性を研究するために、時間に縛られた観察と政策指示観測を紹介します。
一般的な再帰関数のクラスを制限で学習するには、入出力の観測では十分ではありませんが、計算の複雑さの制約を課すか、おおよその時間帯観測で補充することにより、この学習障壁を克服します。
さらに、\ textit {計算エージェント}の観察に関する正式なフレームワークを構築し、ポリシーの軌跡からの学習コンピューター機能が入力と出力からの合理的な関数の学習に減少し、有限状態のトランスデューサーの推論との興味深いつながりを明らかにすることを示します。
ネガティブな面では、ポリシートリューストの観測であっても、線形時間計算可能な関数のクラスには、計算可能または多項量の特性セットが存在できないことを示します。

要約(オリジナル)

This paper studies the problem of learning computable functions in the limit by extending Gold’s inductive inference framework to incorporate \textit{computational observations} and \textit{restricted input sources}. Complimentary to the traditional Input-Output Observations, we introduce Time-Bound Observations, and Policy-Trajectory Observations to study the learnability of general recursive functions under more realistic constraints. While input-output observations do not suffice for learning the class of general recursive functions in the limit, we overcome this learning barrier by imposing computational complexity constraints or supplementing with approximate time-bound observations. Further, we build a formal framework around observations of \textit{computational agents} and show that learning computable functions from policy trajectories reduces to learning rational functions from input and output, thereby revealing interesting connections to finite-state transducer inference. On the negative side, we show that computable or polynomial-mass characteristic sets cannot exist for the class of linear-time computable functions even for policy-trajectory observations.

arxiv情報

著者	Hristo Papazov,Nicolas Flammarion
発行日	2025-06-18 15:17:03+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.DS, cs.FL, cs.LG | コメントを受け付けていません

DAILOC: Domain-Incremental Learning for Indoor Localization using Smartphones

投稿日: 2025年6月19日作成者: jarxiv

要約

Wi-Fiフィンガープリントベースの屋内ローカリゼーションは、デバイスの不均一性と屋内環境内の時間的変動から生じるドメインシフトのために、実際の展開における重要な課題に直面しています。
既存のアプローチは、多くの場合、これらの問題に独立して対処し、その結果、一般化が不十分であり、壊滅的な忘却に対する感受性をもたらします。
この作業では、時間的およびデバイス誘導ドメインの両方のシフトの両方に共同で対処する新しいドメインインクリメンタル学習フレームワークであるDailocを提案します。
Dailocは、マルチレベルのバリエーションオートエンコーダーを使用して、ロケーション関連機能からドメインのシフトを分離する新しい解体戦略を導入します。
さらに、壊滅的な忘却の影響に対処するために、新しい記憶誘導クラスの潜在アライメントメカニズムを導入します。
複数のスマートフォン、建物、およびタイムインスタンスの実験により、Dailocは最先端の方法を大幅に上回り、最大2.74倍低い平均エラーと4.6倍低い最悪のエラーを達成することが示されています。

要約(オリジナル)

Wi-Fi fingerprinting-based indoor localization faces significant challenges in real-world deployments due to domain shifts arising from device heterogeneity and temporal variations within indoor environments. Existing approaches often address these issues independently, resulting in poor generalization and susceptibility to catastrophic forgetting over time. In this work, we propose DAILOC, a novel domain-incremental learning framework that jointly addresses both temporal and device-induced domain shifts. DAILOC introduces a novel disentanglement strategy that separates domain shifts from location-relevant features using a multi-level variational autoencoder. Additionally, we introduce a novel memory-guided class latent alignment mechanism to address the effects of catastrophic forgetting over time. Experiments across multiple smartphones, buildings, and time instances demonstrate that DAILOC significantly outperforms state-of-the-art methods, achieving up to 2.74x lower average error and 4.6x lower worst-case error.

arxiv情報

著者	Akhil Singampalli,Danish Gufran,Sudeep Pasricha
発行日	2025-06-18 15:27:40+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.LG | コメントを受け付けていません

Towards Explainable Indoor Localization: Interpreting Neural Network Learning on Wi-Fi Fingerprints Using Logic Gates

投稿日: 2025年6月19日作成者: jarxiv

要約

ディープラーニング（DL）を使用した屋内ローカリゼーションは、Wi-Fi RSSフィンガープリントを物理的な場所にマッピングする際に強い精度を実証しています。
ただし、ほとんどの既存のDLフレームワークはブラックボックスモデルとして機能し、予測の作成方法やモデルが実際のノイズに時間の経過とともにどのように反応するかについての洞察を限定しています。
この解釈性の欠如は、環境ダイナミクスによって引き起こされる時間的変動の影響を理解し、長期的な信頼性にモデルを適応させる能力を妨げます。
これに対処するために、DLベースの屋内ローカリゼーションを解釈および強化するために設計された新しいロジックゲートベースのフレームワークであるLogNetを紹介します。
lognetは、どのアクセスポイント（AP）が各基準点（RP）に最も影響力があるかを特定することにより、透明な推論を有効にし、環境ノイズがDL駆動型のローカリゼーションの決定をどのように破壊するかを明らかにします。
この解釈可能性により、モデルの障害を追跡および診断し、DLシステムをより安定した長期展開に適応させることができます。
複数の現実世界の建物のフロアプランと2年以上の時間的変動にわたる評価は、LogNetがDLモデルの内部動作を解釈するだけでなく、最大1.1倍から2.8倍の低いローカリゼーションエラー、3.4倍から43.3倍のモデルサイズ、1.5倍から1.5倍から3.6x低いレイテンシーを改善することを示しています。

要約(オリジナル)

Indoor localization using deep learning (DL) has demonstrated strong accuracy in mapping Wi-Fi RSS fingerprints to physical locations; however, most existing DL frameworks function as black-box models, offering limited insight into how predictions are made or how models respond to real-world noise over time. This lack of interpretability hampers our ability to understand the impact of temporal variations – caused by environmental dynamics – and to adapt models for long-term reliability. To address this, we introduce LogNet, a novel logic gate-based framework designed to interpret and enhance DL-based indoor localization. LogNet enables transparent reasoning by identifying which access points (APs) are most influential for each reference point (RP) and reveals how environmental noise disrupts DL-driven localization decisions. This interpretability allows us to trace and diagnose model failures and adapt DL systems for more stable long-term deployments. Evaluations across multiple real-world building floorplans and over two years of temporal variation show that LogNet not only interprets the internal behavior of DL models but also improves performance-achieving up to 1.1x to 2.8x lower localization error, 3.4x to 43.3x smaller model size, and 1.5x to 3.6x lower latency compared to prior DL-based models.

arxiv情報

著者	Danish Gufran,Sudeep Pasricha
発行日	2025-06-18 15:34:41+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.LG | コメントを受け付けていません

Fractured Chain-of-Thought Reasoning

投稿日: 2025年6月19日作成者: jarxiv

要約

推論時間スケーリング手法は、再訓練なしで推論で追加の計算努力を活用することにより、大規模な言語モデル（LLM）の推論能力を大幅に強化しました。
同様に、チェーンオブ考え（COT）プロンプトとその拡張である長いCOTは、豊富な中間推論の軌跡を生成することにより精度を向上させますが、これらのアプローチは、遅延に敏感な設定での展開を妨げる実質的なトークンコストが発生します。
この作業では、まず、完了前に推論を停止し、最終的な回答を直接生成する切り捨てられたCOTが、劇的に少ないトークンを使用しながら完全なCOTサンプリングと一致することが多いことを示します。
この洞察に基づいて、3つの直交軸に沿って完全なCOTとソリューションのみのサンプリングを補間する統一された推論時間戦略である骨折したサンプリングを導入します。（1）推論軌跡の数、（2）トランジジェントあたりの最終溶液の数、および（3）推論の痕跡が分散される深さ。
5つの多様な推論ベンチマークといくつかのモデルスケールに関する広範な実験を通じて、骨折したサンプリングが一貫して優れた精度コストのトレードオフを達成し、Pass@K対トークン予算の急な対数線形スケーリングゲインをもたらすことを実証します。
私たちの分析では、これらの次元に計算を割り当てる方法を明らかにして、パフォーマンスを最大化し、より効率的でスケーラブルなLLMの推論への道を開いています。
コードはhttps://github.com/baohaoliao/frac-cotで入手できます。

要約(オリジナル)

Inference-time scaling techniques have significantly bolstered the reasoning capabilities of large language models (LLMs) by harnessing additional computational effort at inference without retraining. Similarly, Chain-of-Thought (CoT) prompting and its extension, Long CoT, improve accuracy by generating rich intermediate reasoning trajectories, but these approaches incur substantial token costs that impede their deployment in latency-sensitive settings. In this work, we first show that truncated CoT, which stops reasoning before completion and directly generates the final answer, often matches full CoT sampling while using dramatically fewer tokens. Building on this insight, we introduce Fractured Sampling, a unified inference-time strategy that interpolates between full CoT and solution-only sampling along three orthogonal axes: (1) the number of reasoning trajectories, (2) the number of final solutions per trajectory, and (3) the depth at which reasoning traces are truncated. Through extensive experiments on five diverse reasoning benchmarks and several model scales, we demonstrate that Fractured Sampling consistently achieves superior accuracy-cost trade-offs, yielding steep log-linear scaling gains in Pass@k versus token budget. Our analysis reveals how to allocate computation across these dimensions to maximize performance, paving the way for more efficient and scalable LLM reasoning. Code is available at https://github.com/BaohaoLiao/frac-cot.

arxiv情報

著者	Baohao Liao,Hanze Dong,Yuhui Xu,Doyen Sahoo,Christof Monz,Junnan Li,Caiming Xiong
発行日	2025-06-18 15:41:14+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL, cs.LG, stat.ML | コメントを受け付けていません

Managing Complex Failure Analysis Workflows with LLM-based Reasoning and Acting Agents

投稿日: 2025年6月19日作成者: jarxiv

要約

障害分析（FA）は、非常に複雑で知識集約的なプロセスです。
FAラボの計算インフラストラクチャ内でのAIコンポーネントの統合には、画像の不適合の検出、多様なデータソースからの類似のケースの取得、注釈付き画像からのレポートの生成など、さまざまなタスクを自動化する可能性があります。
ただし、展開されたAIモデルの数が増えると、これらのコンポーネントを組織化することに課題があり、FAプロセスとシームレスに統合されるまとまりのある効率的なワークフローになります。
このペーパーでは、FAエンジニアが分析ケースの解決を支援するために、大規模な言語モデル（LLM）ベースの計画エージェント（LPA）の設計と実装を調査します。
LPAは、LLMを高度な計画機能および外部ツール利用と統合し、複雑なクエリの自律処理、外部システムからの関連データの取得、および人間の読み取り可能な応答の生成を可能にします。
評価の結果は、FAタスクのサポートにおけるエージェントの運用上の有効性と信頼性を示しています。

要約(オリジナル)

Failure Analysis (FA) is a highly intricate and knowledge-intensive process. The integration of AI components within the computational infrastructure of FA labs has the potential to automate a variety of tasks, including the detection of non-conformities in images, the retrieval of analogous cases from diverse data sources, and the generation of reports from annotated images. However, as the number of deployed AI models increases, the challenge lies in orchestrating these components into cohesive and efficient workflows that seamlessly integrate with the FA process. This paper investigates the design and implementation of a Large Language Model (LLM)-based Planning Agent (LPA) to assist FA engineers in solving their analysis cases. The LPA integrates LLMs with advanced planning capabilities and external tool utilization, enabling autonomous processing of complex queries, retrieval of relevant data from external systems, and generation of human-readable responses. Evaluation results demonstrate the agent’s operational effectiveness and reliability in supporting FA tasks.

arxiv情報

著者	Aline Dobrovsky,Konstantin Schekotihin,Christian Burmer
発行日	2025-06-18 15:43:10+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.LG | コメントを受け付けていません

‘Generate’ the Future of Work through AI: Empirical Evidence from Online Labor Markets

投稿日: 2025年6月19日作成者: jarxiv

要約

CHATGPTなどの大規模な言語モデル（LLM）ベースの生成AIシステムは、幅広いダウンストリームタスクでゼロショット学習機能を実証しています。
それらの汎用性と職務機能を強化または自動化する可能性により、これらのシステムは労働市場のダイナミクスを再構築する態勢が整っています。
ただし、AIの需要と供給の両方に対する同時効果、および市場参加者の戦略的対応を考えると、正確なインパクト\ textIT {a priori}を予測することは困難です。
主要なオンライン労働プラットフォームから広範なデータセットを活用して、必要なスキルがコアLLM機能と密接に整合する場合、顕著な変位効果とサブマーケットでの全体的な収縮を文書化します。
需要と供給の両方が低下しますが、供給の減少は比較的小さく、それによりフリーランサー間の競争が強化されます。
特に、さらなる分析により、この高度な競争は、プログラミング集約型のサブマーケットで特に顕著であることが示されています。
このパターンは、スキルトランジション効果に起因しています。プログラミングに人間資本の障壁を下げることにより、CHATGPTを使用すると、現職のフリーランサーはプログラミングタスクに参加できます。
さらに、これらの移行は均一ではなく、高スキルのフリーランサーがシフトに不釣り合いに貢献しています。
私たちの調査結果は、労働市場に対する汎用AIの多面的な影響を明らかにし、特定の職業の移動だけでなく、労働供給内のスキル移行の誘発を強調しています。
これらの洞察は、政策立案者、プラットフォームオペレーター、および労働者に実際的な意味を提供します。

要約(オリジナル)

Large Language Model (LLM)-based generative AI systems, such as ChatGPT, demonstrate zero-shot learning capabilities across a wide range of downstream tasks. Owing to their general-purpose nature and potential to augment or even automate job functions, these systems are poised to reshape labor market dynamics. However, predicting their precise impact \textit{a priori} is challenging, given AI’s simultaneous effects on both demand and supply, as well as the strategic responses of market participants. Leveraging an extensive dataset from a leading online labor platform, we document a pronounced displacement effect and an overall contraction in submarkets where required skills closely align with core LLM functionalities. Although demand and supply both decline, the reduction in supply is comparatively smaller, thereby intensifying competition among freelancers. Notably, further analysis shows that this heightened competition is especially pronounced in programming-intensive submarkets. This pattern is attributed to skill-transition effects: by lowering the human-capital barrier to programming, ChatGPT enables incumbent freelancers to enter programming tasks. Moreover, these transitions are not homogeneous, with high-skilled freelancers contributing disproportionately to the shift. Our findings illuminate the multifaceted impacts of general-purpose AI on labor markets, highlighting not only the displacement of certain occupations but also the inducement of skill transitions within the labor supply. These insights offer practical implications for policymakers, platform operators, and workers.

arxiv情報

著者	Jin Liu,Xingchen Xu,Xi Nan,Yongjun Li,Yong Tan
発行日	2025-06-18 16:05:10+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.HC, econ.GN, J.4, q-fin.EC | コメントを受け付けていません

WikiMixQA: A Multimodal Benchmark for Question Answering over Tables and Charts

投稿日: 2025年6月19日作成者: jarxiv

要約

ドキュメントは、情報を保存および普及させることの基本であり、自動ドキュメント理解に大きな課題をもたらす複雑なレイアウト、表、およびチャートを組み込んだことがよくあります（DU）。
Vision-Language Large Models（VLLMS）はさまざまなタスクにわたって改善を実証していますが、長いコンテキスト視力入力の処理におけるそれらの有効性は不明のままです。
このペーパーでは、7つの異なるトピックにまたがる4,000のウィキペディアページから抽出されたテーブルとチャートのクロスモーダル推論を評価するために設計された1,000件の多肢選択質問（MCQ）を含むベンチマークであるWikimixQaを紹介します。
既存のベンチマークとは異なり、WikimixQaは、モデルに複数のモダリティからの情報を合成することをモデルに要求することにより、複雑な推論を強調しています。
12の最先端のビジョン言語モデルを評価し、独自のモデルが直接コンテキストを提供すると約70％の精度を達成する一方で、長いドキュメントからの検索が必要な場合、パフォーマンスは大幅に悪化することが明らかになります。
これらのうち、GPT-4-Oはこの設定で50％の精度を超える唯一のモデルですが、オープンソースモデルはかなり悪化し、最大精度は27％です。
これらの調査結果は、長いコンテキスト、マルチモーダル推論の課題を強調し、ドキュメント理解の研究を進めるための重要なベンチマークとしてWikimixqaを確立します。

要約(オリジナル)

Documents are fundamental to preserving and disseminating information, often incorporating complex layouts, tables, and charts that pose significant challenges for automatic document understanding (DU). While vision-language large models (VLLMs) have demonstrated improvements across various tasks, their effectiveness in processing long-context vision inputs remains unclear. This paper introduces WikiMixQA, a benchmark comprising 1,000 multiple-choice questions (MCQs) designed to evaluate cross-modal reasoning over tables and charts extracted from 4,000 Wikipedia pages spanning seven distinct topics. Unlike existing benchmarks, WikiMixQA emphasizes complex reasoning by requiring models to synthesize information from multiple modalities. We evaluate 12 state-of-the-art vision-language models, revealing that while proprietary models achieve ~70% accuracy when provided with direct context, their performance deteriorates significantly when retrieval from long documents is required. Among these, GPT-4-o is the only model exceeding 50% accuracy in this setting, whereas open-source models perform considerably worse, with a maximum accuracy of 27%. These findings underscore the challenges of long-context, multi-modal reasoning and establish WikiMixQA as a crucial benchmark for advancing document understanding research.

arxiv情報

著者	Negar Foroutan,Angelika Romanou,Matin Ansaripour,Julian Martin Eisenschlos,Karl Aberer,Rémi Lebret
発行日	2025-06-18 16:09:18+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL, cs.LG | コメントを受け付けていません

From Model to Classroom: Evaluating Generated MCQs for Portuguese with Narrative and Difficulty Concerns

投稿日: 2025年6月19日作成者: jarxiv

要約

MCQは学習と評価に役立ちますが、さまざまな難易度とターゲットを絞ったリーディングスキルで手動でそれらを作成することは、時間がかかり、費用のかかるタスクのままです。
生成AIの最近の進歩は、MCQ生成を効率的に自動化する機会を提供します。
ただし、生成されたMCQの実際の品質と信頼性を評価することは、特に生成が失敗する場合に関して、限られた注意を払っています。
この側面は、生成されたMCQが実際の設定で適用されることを意図している場合に特に重要になります。
さらに、ほとんどのMCQ世代の研究は英語に焦点を当てており、他の言語は掘り下げられていません。
このペーパーでは、形態学的に豊富な言語であるポルトガル語で、読解力のためにMCQを生産する際の現在の生成モデルの能力を調査します。
私たちの研究は、カリキュラムに関連する物語要素に合わせて、さまざまな難易度レベルに合わせたMCQを生成することに焦点を当てています。
専門家のレビューを通じて、学生の応答から抽出された心理測定特性を分析して、小学生への適合性を評価することにより、これらのMCQを評価します。
我々の結果は、現在のモデルが人間の著者と同等の品質のMCQを生成できることを示しています。
ただし、セマンティックの明確さと回答性に関連する問題を特定します。
また、学生を関与させ、高品質のMCQオプション設計の確立された基準を満たすディストラクタを生成する課題は残っています。

要約(オリジナル)

While MCQs are valuable for learning and evaluation, manually creating them with varying difficulty levels and targeted reading skills remains a time-consuming and costly task. Recent advances in generative AI provide an opportunity to automate MCQ generation efficiently. However, assessing the actual quality and reliability of generated MCQs has received limited attention — particularly regarding cases where generation fails. This aspect becomes particularly important when the generated MCQs are meant to be applied in real-world settings. Additionally, most MCQ generation studies focus on English, leaving other languages underexplored. This paper investigates the capabilities of current generative models in producing MCQs for reading comprehension in Portuguese, a morphologically rich language. Our study focuses on generating MCQs that align with curriculum-relevant narrative elements and span different difficulty levels. We evaluate these MCQs through expert review and by analyzing the psychometric properties extracted from student responses to assess their suitability for elementary school students. Our results show that current models can generate MCQs of comparable quality to human-authored ones. However, we identify issues related to semantic clarity and answerability. Also, challenges remain in generating distractors that engage students and meet established criteria for high-quality MCQ option design.

arxiv情報

著者	Bernardo Leite,Henrique Lopes Cardoso,Pedro Pinto,Abel Ferreira,Luís Abreu,Isabel Rangel,Sandra Monteiro
発行日	2025-06-18 16:19:46+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL | コメントを受け付けていません

LoX: Low-Rank Extrapolation Robustifies LLM Safety Against Fine-tuning

投稿日: 2025年6月19日作成者: jarxiv

要約

大規模な言語モデル（LLM）は、実際のアプリケーションで不可欠になっています。
しかし、彼らの広範な採用は、特に社会的に有害な質問への対応において、重大な安全上の懸念を引き起こします。
アライメントを通じてモデルの安全性を改善するための実質的な努力にもかかわらず、アラインドされたモデルは、追加のトレーニングデータが良性に見える場合でも、その後の微調整によって安全保護を損なう可能性があります。
この論文では、この脆弱性は、LLMパラメーターの安全性が批判的な低ランクのサブスペースの微調整への感度に起因することを経験的に実証します。
この洞察に基づいて、整列したLLMの安全サブスペースを外挿することにより、安全堅牢性を高めるために、低ランク外挿（LOX）と呼ばれる新しいトレーニングなしの方法を提案します。
実験結果はLOXの有効性を確認し、新しいタスクに対するモデルの適応性を維持しながら、良性および悪意のある微調整攻撃の両方に対する堅牢性の大幅な改善を示しています。
たとえば、LOXは、良性または悪意のある微調整攻撃に直面している攻撃成功率（ASR）の11％から54％の絶対削減につながります。
パラメーターのASRランドスケープを調査することにより、LOXの成功は、外挿がLLMパラメーターをよりフラットゾーンに移動させ、それにより摂動に敏感ではないことに起因します。
コードはgithub.com/vita-group/loxで入手できます。

要約(オリジナル)

Large Language Models (LLMs) have become indispensable in real-world applications. However, their widespread adoption raises significant safety concerns, particularly in responding to socially harmful questions. Despite substantial efforts to improve model safety through alignment, aligned models can still have their safety protections undermined by subsequent fine-tuning – even when the additional training data appears benign. In this paper, we empirically demonstrate that this vulnerability stems from the sensitivity of safety-critical low-rank subspaces in LLM parameters to fine-tuning. Building on this insight, we propose a novel training-free method, termed Low-Rank Extrapolation (LoX), to enhance safety robustness by extrapolating the safety subspace of an aligned LLM. Our experimental results confirm the effectiveness of LoX, demonstrating significant improvements in robustness against both benign and malicious fine-tuning attacks while preserving the model’s adaptability to new tasks. For instance, LoX leads to 11% to 54% absolute reductions in attack success rates (ASR) facing benign or malicious fine-tuning attacks. By investigating the ASR landscape of parameters, we attribute the success of LoX to that the extrapolation moves LLM parameters to a flatter zone, thereby less sensitive to perturbations. The code is available at github.com/VITA-Group/LoX.

arxiv情報

著者	Gabrel J. Perin,Runjin Chen,Xuxi Chen,Nina S. T. Hirata,Zhangyang Wang,Junyuan Hong
発行日	2025-06-18 16:30:02+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL, cs.LG | コメントを受け付けていません

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント