jarxiv | Japanese arxiv | ページ 818

Mitigating Traffic Oscillations in Mixed Traffic Flow with Scalable Deep Koopman Predictive Control

投稿日: 2025年4月23日作成者: jarxiv

要約

接続された自動車両（CAV）の使用は、CAVとヒト駆動車（HDV）で構成される混合交通流の交通振動を軽減することを提唱しています。
この研究では、混合トラフィックフローを調節するための適応的なディープクープマン予測制御フレームワーク（ADAPKOOPPC）を提案しています。
第一に、Koopman理論に基づいた適応軌道予測ディープネットワーク（AdapkoopNet）は、HDVSカーフォローの動作をモデル化するために設計されています。
AdapkoopNetは、高次元空間内の線形モデルによるHDVの動作を表現できます。
第二に、モデル予測制御が使用され、混合トラフィックフローを滑らかにします。ここでは、AdapKoopNetからのCavsの線形動的モデルと線形予測ブロックの組み合わせが予測モデルとしてAdapkoopPCに埋め込まれます。
最後に、ProSed AdapkoopNetの予測パフォーマンスは、HighD Naturalistic Driving Datasetを使用して検証されます。
さらに、AdapkoopPCの制御性能は、数値シミュレーションによって検証されます。
結果は、AdapkoopNetがベースラインの非線形モデルよりもHDVが予測される軌道をより精査することを示しています。
さらに、提案されているAdapkoopPCは、特に低いCavs浸透率でのトラフィック振動を緩和する際のベースラインと比較して、より効果的なコントロールパフォーマンスを示します。
提案されたadapkooppcのコードはオープンソースです。

要約(オリジナル)

The use of connected automated vehicle (CAV) is advocated to mitigate traffic oscillations in mixed traffic flow consisting of CAVs and human driven vehicles (HDVs). This study proposes an adaptive deep Koopman predictive control framework (AdapKoopPC) for regulating mixed traffic flow. Firstly, a Koopman theory-based adaptive trajectory prediction deep network (AdapKoopnet) is designed for modeling HDVs car-following behavior. AdapKoopnet enables the representation of HDVs behavior by a linear model in a high-dimensional space. Secondly, the model predictive control is employed to smooth the mixed traffic flow, where the combination of the linear dynamic model of CAVs and linear prediction blocks from AdapKoopnet is embedded as the predictive model into the AdapKoopPC. Finally, the predictive performance of the prosed AdapKoopnet is verified using the HighD naturalistic driving dataset. Furthermore, the control performance of AdapKoopPC is validated by the numerical simulations. Results demonstrate that the AdapKoopnet provides more accuracy HDVs predicted trajectories than the baseline nonlinear models. Moreover, the proposed AdapKoopPC exhibits more effective control performance with less computation cost compared with baselines in mitigating traffic oscillations, especially at the low CAVs penetration rates. The code of proposed AdapKoopPC is open source.

arxiv情報

著者	Hao Lyu,Yanyong Guo,Pan Liu,Nan Zheng,Ting Wang,Quansheng Yue
発行日	2025-04-22 15:15:14+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.SY, eess.SY | コメントを受け付けていません

Bug Destiny Prediction in Large Open-Source Software Repositories through Sentiment Analysis and BERT Topic Modeling

投稿日: 2025年4月23日作成者: jarxiv

要約

この研究では、Bugzilla Eclipseプロジェクトのデータを使用して、解決時間、修正時間、バグの最終ステータスなど、主要なバグ関連の結果を予測するための新しいアプローチを調査します。
具体的には、予測精度を高めるためにバグが解決される前に利用可能な機能を活用します。
私たちの方法論には、感情分析が組み込まれて、感情スコアと感情分類（肯定的または否定的）の両方を導き出します。
さらに、バグの優先度レベルとそのトピックを統合し、吸収性ニューラルネットワーク（CNN）および多層Perceptron（MLP）の特徴として脳視モデルを使用して抽出しました。
私たちの調査結果は、脳底分析と感情分析の組み合わせが特定のモデルパフォーマンスメトリックを改善できることを示しています。
さらに、モデル入力のバランスをとると、ほとんどの場合、精度が大幅に減少するという犠牲を払っていますが、実用的な適用性が向上することがわかります。
主要な目的に対処し、解像度までの時間、フィックスまでの時間、およびバグの運命を予測するために、バイナリ分類と正確な時間値予測の両方を採用して、予測効果の比較評価を可能にします。
結果は、感情分析がバグの最終的な結果の貴重な予測因子として機能し、特にそれが修正されるかどうかを決定することを示しています。
ただし、バグをより複雑なまたは型破りな結果カテゴリに分類する場合、その有用性はあまり顕著ではありません。

要約(オリジナル)

This study explores a novel approach to predicting key bug-related outcomes, including the time to resolution, time to fix, and ultimate status of a bug, using data from the Bugzilla Eclipse Project. Specifically, we leverage features available before a bug is resolved to enhance predictive accuracy. Our methodology incorporates sentiment analysis to derive both an emotionality score and a sentiment classification (positive or negative). Additionally, we integrate the bug’s priority level and its topic, extracted using a BERTopic model, as features for a Convolutional Neural Network (CNN) and a Multilayer Perceptron (MLP). Our findings indicate that the combination of BERTopic and sentiment analysis can improve certain model performance metrics. Furthermore, we observe that balancing model inputs enhances practical applicability, albeit at the cost of a significant reduction in accuracy in most cases. To address our primary objectives, predicting time-to-resolution, time-to-fix, and bug destiny, we employ both binary classification and exact time value predictions, allowing for a comparative evaluation of their predictive effectiveness. Results demonstrate that sentiment analysis serves as a valuable predictor of a bug’s eventual outcome, particularly in determining whether it will be fixed. However, its utility is less pronounced when classifying bugs into more complex or unconventional outcome categories.

arxiv情報

著者	Sophie C. Pope,Andrew Barovic,Armin Moin
発行日	2025-04-22 15:18:14+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.SE | コメントを受け付けていません

A Common Pitfall of Margin-based Language Model Alignment: Gradient Entanglement

投稿日: 2025年4月23日作成者: jarxiv

要約

人間のフィードバック（RLHF）からの強化学習は、言語モデル（LM）アライメントの主要なアプローチとなっています。
その中心では、RLHFは好みの最適化のためにマージンベースの損失を使用し、優先応答と分散応答の違いによってのみ理想的なLMの動作を指定します。
このホワイトペーパーでは、マージンベースの方法の一般的な落とし穴、つまり優先および分散した応答に関する理想的なLMの挙動の個別に特徴的な落とし穴を特定します。これは、マージンが増加するにつれて2つの意図しない結果をもたらします。
（2）優先応答の確率は、これらの応答が理想的であっても減少する場合があります。
これらの問題のある行動の背後にある理由を分かります：マージンベースの損失は、分配された勾配の勾配に対する優先確率の変化を結びつけ、その逆も同様であり、希望する確率が減少し、したがって、両方の確率で同期された増加または減少を引き起こすことがよくあります。
マージンベースの目的に固有のこの効果、グラデーションエンタングルメントと呼ばれます。
正式には、勾配エンタングルメントが懸念される一般的なマージンベースのアライメント目標の条件を導き出します。優先された勾配と分散した対数プロビリティの勾配の内部積は、個々の勾配規範に比べて大きくなります。
言語モデルを調整するときにそのような内部製品が大きくなる理由を理論的に調査し、調査結果を経験的に検証します。
私たちのフレームワークの経験的意味は、さまざまな優先最適化アルゴリズムのトレーニングダイナミクスの重要な違いを説明し、潜在的なアルゴリズムがマージンベースの方法の特徴的な問題を軽減し、それによって言語モデルの調整を改善することを示唆することにまで及びます。

要約(オリジナル)

Reinforcement Learning from Human Feedback (RLHF) has become the predominant approach for language model (LM) alignment. At its core, RLHF uses a margin-based loss for preference optimization, specifying ideal LM behavior only by the difference between preferred and dispreferred responses. In this paper, we identify a common pitfall of margin-based methods — the under-specification of ideal LM behavior on preferred and dispreferred responses individually, which leads to two unintended consequences as the margin increases: (1) The probability of dispreferred (e.g., unsafe) responses may increase, resulting in potential safety alignment failures. (2) The probability of preferred responses may decrease, even when those responses are ideal. We demystify the reasons behind these problematic behaviors: margin-based losses couple the change in the preferred probability to the gradient of the dispreferred one, and vice versa, often preventing the preferred probability from increasing while the dispreferred one decreases, and thus causing a synchronized increase or decrease in both probabilities. We term this effect, inherent in margin-based objectives, gradient entanglement. Formally, we derive conditions for general margin-based alignment objectives under which gradient entanglement becomes concerning: the inner product of the gradients of preferred and dispreferred log-probabilities is large relative to the individual gradient norms. We theoretically investigate why such inner products can be large when aligning language models and empirically validate our findings. Empirical implications of our framework extend to explaining important differences in the training dynamics of various preference optimization algorithms, and suggesting potential algorithm designs to mitigate the under-specification issue of margin-based methods and thereby improving language model alignment.

arxiv情報

著者	Hui Yuan,Yifan Zeng,Yue Wu,Huazheng Wang,Mengdi Wang,Liu Leqi
発行日	2025-04-22 15:20:34+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL, cs.LG | コメントを受け付けていません

Multimodal Laryngoscopic Video Analysis for Assisted Diagnosis of Vocal Fold Paralysis

投稿日: 2025年4月23日作成者: jarxiv

要約

このホワイトペーパーでは、オーディオデータとビデオデータの両方を活用して、生の喉頭ビデオセグメントとメトリックを自動的に抽出し、臨床評価を支援するための主要なビデオセグメントとメトリックを自動的に抽出するマルチモーダル喉頭鏡ビデオ分析システム（MLVAS）を紹介します。
このシステムは、ビデオベースのglottis検出をオーディオキーワードスポッティング方法と統合して、ビデオデータとオーディオデータの両方を分析し、患者の発声を識別し、ビデオのハイライトを洗練して、ボーカルフォールドの動きを最適に検査することを保証します。
生の喉頭ビデオからの主要なビデオセグメント抽出を超えて、MLVAはボーカルフォールド麻痺（VFP）検出のための効果的なオーディオと視覚的機能を生成できます。
事前に訓練されたオーディオエンコーダーは、患者の音声をエンコードしてオーディオ機能を取得するために使用されます。
視覚的特徴は、セグメント化された声門マスク上の推定声門正中線に左右のボーカルフォールドの角度偏差を測定することにより生成されます。
より良いマスクを取得するために、誤検知を減らすために従来のU-Netセグメンテーションに続く拡散ベースの改良を導入します。
提案されたMLVAの各モジュールの有効性とモダリティを実証するために、いくつかのアブレーション研究を実施しました。
パブリックセグメンテーションデータセットの実験結果は、提案されたセグメンテーションモジュールの有効性を示しています。
さらに、現実世界のクリニックデータセットでの一方的なVFP分類結果は、信頼できる客観的なメトリックを提供するMLVASの能力と、臨床診断を支援するための視覚化を実証しています。

要約(オリジナル)

This paper presents the Multimodal Laryngoscopic Video Analyzing System (MLVAS), a novel system that leverages both audio and video data to automatically extract key video segments and metrics from raw laryngeal videostroboscopic videos for assisted clinical assessment. The system integrates video-based glottis detection with an audio keyword spotting method to analyze both video and audio data, identifying patient vocalizations and refining video highlights to ensure optimal inspection of vocal fold movements. Beyond key video segment extraction from the raw laryngeal videos, MLVAS is able to generate effective audio and visual features for Vocal Fold Paralysis (VFP) detection. Pre-trained audio encoders are utilized to encode the patient voice to get the audio features. Visual features are generated by measuring the angle deviation of both the left and right vocal folds to the estimated glottal midline on the segmented glottis masks. To get better masks, we introduce a diffusion-based refinement that follows traditional U-Net segmentation to reduce false positives. We conducted several ablation studies to demonstrate the effectiveness of each module and modalities in the proposed MLVAS. The experimental results on a public segmentation dataset show the effectiveness of our proposed segmentation module. In addition, unilateral VFP classification results on a real-world clinic dataset demonstrate MLVAS’s ability of providing reliable and objective metrics as well as visualization for assisted clinical diagnosis.

arxiv情報

著者	Yucong Zhang,Xin Zou,Jinshan Yang,Wenjun Chen,Juan Liu,Faya Liang,Ming Li
発行日	2025-04-22 15:32:41+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.SD, eess.AS | コメントを受け付けていません

W-PCA Based Gradient-Free Proxy for Efficient Search of Lightweight Language Models

投稿日: 2025年4月23日作成者: jarxiv

要約

効率的な自然言語処理（NLP）システムの需要は、軽量言語モデルの開発につながりました。
この分野での以前の研究は、主に手動の設計またはトレーニングベースのニューラルアーキテクチャ検索（NAS）方法に焦点を当てています。
最近、トレーニングを必要とせずに言語モデルを評価するためのゼロショットNASメソッドが提案されています。
ただし、ゼロショットNASへの一般的なアプローチは、偏った評価メトリックや計算の非効率性などの課題に直面することがよくあります。
このホワイトペーパーでは、軽量言語モデルに合わせて特別に調整された新しいゼロショットNASメソッドである重量加重PCA（W-PCA）を紹介します。
当社のアプローチでは、2つの評価プロキシを利用しています。パラメーターカウントと、フィードフォワードニューラル（FFN）層の$ \ eta $を超える累積寄与を持つ主成分の数。
さらに、勾配計算の必要性を排除することにより、評価時間を最適化し、軽量言語モデルの設計と評価の効率を高めます。
接着剤と分隊のデータセットで比較分析を実施して、アプローチを評価します。
結果は、私たちの方法が、ワンショットNASメソッドと比較してトレーニング時間を大幅に短縮し、以前の最先端のトレーニングベースの方法と比較してテスト段階でより高いスコアを達成することを示しています。
さらに、Flexibert検索スペースからサンプリングされたデータセットでランキング評価を実行します。
私たちのアプローチは、優れたランキング相関を示し、勾配計算を必要とする他のゼロショットNASメソッドと比較して、時間をさらに短縮します。

要約(オリジナル)

The demand for efficient natural language processing (NLP) systems has led to the development of lightweight language models. Previous work in this area has primarily focused on manual design or training-based neural architecture search (NAS) methods. Recently, zero-shot NAS methods have been proposed for evaluating language models without the need for training. However, prevailing approaches to zero-shot NAS often face challenges such as biased evaluation metrics and computational inefficiencies. In this paper, we introduce weight-weighted PCA (W-PCA), a novel zero-shot NAS method specifically tailored for lightweight language models. Our approach utilizes two evaluation proxies: the parameter count and the number of principal components with cumulative contribution exceeding $\eta$ in the feed-forward neural (FFN) layer. Additionally, by eliminating the need for gradient computations, we optimize the evaluation time, thus enhancing the efficiency of designing and evaluating lightweight language models. We conduct a comparative analysis on the GLUE and SQuAD datasets to evaluate our approach. The results demonstrate that our method significantly reduces training time compared to one-shot NAS methods and achieves higher scores in the testing phase compared to previous state-of-the-art training-based methods. Furthermore, we perform ranking evaluations on a dataset sampled from the FlexiBERT search space. Our approach exhibits superior ranking correlation and further reduces solving time compared to other zero-shot NAS methods that require gradient computation.

arxiv情報

著者	Shang Wang
発行日	2025-04-22 15:33:01+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL | コメントを受け付けていません

SCMPPI: Supervised Contrastive Multimodal Framework for Predicting Protein-Protein Interactions

投稿日: 2025年4月23日作成者: jarxiv

要約

タンパク質間相互作用（PPI）予測は、細胞機能と疾患メカニズムを解読する上で極めて重要な役割を果たします。
クロスモーダル特徴の融合と偽陰性抑制における従来の実験方法と既存の計算アプローチの制限に対処するために、SCMPPI-A新規監視された対照的なマルチモーダルフレームワークを提案します。
シーケンスベースの機能（AAC、DPC、ESMC-CKSAAP）をネットワークトポロジ（node2VECエンミング）と効果的に統合し、強化された対照学習戦略をネガティブサンプルフィルタリングと組み込むことにより、SCMPPIは優れた予測パフォーマンスを実現します。
8つのベンチマークデータセットでの広範な実験では、最先端の精度（98.13％）とAUC（99.69％）と、優れた異種の一般化（AUC> 99％）とともに実証されています。
CD9ネットワーク、WNT経路分析、およびがん固有のネットワークでの成功したアプリケーションは、疾患標的発見の可能性をさらに強調し、SCMPPIをマルチモーダル生物学的データ分析の強力なツールとして確立します。

要約(オリジナル)

Protein-protein interaction (PPI) prediction plays a pivotal role in deciphering cellular functions and disease mechanisms. To address the limitations of traditional experimental methods and existing computational approaches in cross-modal feature fusion and false-negative suppression, we propose SCMPPI-a novel supervised contrastive multimodal framework. By effectively integrating sequence-based features (AAC, DPC, ESMC-CKSAAP) with network topology (Node2Vec embeddings) and incorporating an enhanced contrastive learning strategy with negative sample filtering, SCMPPI achieves superior prediction performance. Extensive experiments on eight benchmark datasets demonstrate its state-of-the-art accuracy(98.13%) and AUC(99.69%), along with excellent cross-species generalization (AUC>99%). Successful applications in CD9 networks, Wnt pathway analysis, and cancer-specific networks further highlight its potential for disease target discovery, establishing SCMPPI as a powerful tool for multimodal biological data analysis.

arxiv情報

著者	Shengrui XU,Tianchi Lu,Zikun Wang,Jixiu Zhai,Jingwan Wang
発行日	2025-04-22 15:48:58+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: 68T07, 92C40, cs.AI, cs.LG, I.2.6, q-bio.QM | コメントを受け付けていません

OPUS-VFL: Incentivizing Optimal Privacy-Utility Tradeoffs in Vertical Federated Learning

投稿日: 2025年4月23日作成者: jarxiv

要約

Vertical Federated Learning（VFL）は、分離機能スペースを持つ組織を可能にしますが、ユーザーベースを共有して、生データを共有せずにモデルを協力してトレーニングします。
ただし、既存のVFLシステムは重大な制限に直面しています。多くの場合、効果的なインセンティブメカニズムがなく、プライバシーの有益なトレードオフのバランスをとるのに苦労し、異質なリソース機能を備えたクライアントに対応できません。
これらの課題は、意味のある参加を妨げ、モデルのパフォーマンスを低下させ、実用的な展開を制限します。
これらの問題に対処するために、VFLの最適なプライバシー – ユーティリティトレードオフ戦略であるOpus-VFLを提案します。
Opus-VFLは、モデル貢献、プライバシー保存、リソース投資の原則的な組み合わせに基づいてクライアントに報いる、プライバシーを意識した斬新なインセンティブメカニズムを紹介します。
クライアントごとの機能の重要性を定量化するために、軽量の休暇（LOO）戦略を採用し、クライアントがノイズレベルを動的に較正して個々のユーティリティを最適化できるようにする適応型差動プライバシーメカニズムを統合します。
私たちのフレームワークは、推論や中毒攻撃に対してスケーラブルで予算のバランスが取れており、堅牢であるように設計されています。
ベンチマークデータセット（MNIST、CIFAR-10、およびCIFAR-100）に関する広範な実験は、OPUS-VFLが効率と堅牢性の両方で最先端のVFLベースラインを大幅に上回ることを示しています。
ラベルの推論攻撃の成功率を最大20％削減し、特徴推論の再構成エラー（MSE）を30％以上増加させ、プライバシーとコストの制約を尊重しながら有意義に貢献するクライアントの最大25％のインセンティブを達成します。
これらの結果は、現実世界のVFLの安全で公正な、パフォーマンス駆動型ソリューションとしてのOPUS-VFLの実用性と革新を強調しています。

要約(オリジナル)

Vertical Federated Learning (VFL) enables organizations with disjoint feature spaces but shared user bases to collaboratively train models without sharing raw data. However, existing VFL systems face critical limitations: they often lack effective incentive mechanisms, struggle to balance privacy-utility tradeoffs, and fail to accommodate clients with heterogeneous resource capabilities. These challenges hinder meaningful participation, degrade model performance, and limit practical deployment. To address these issues, we propose OPUS-VFL, an Optimal Privacy-Utility tradeoff Strategy for VFL. OPUS-VFL introduces a novel, privacy-aware incentive mechanism that rewards clients based on a principled combination of model contribution, privacy preservation, and resource investment. It employs a lightweight leave-one-out (LOO) strategy to quantify feature importance per client, and integrates an adaptive differential privacy mechanism that enables clients to dynamically calibrate noise levels to optimize their individual utility. Our framework is designed to be scalable, budget-balanced, and robust to inference and poisoning attacks. Extensive experiments on benchmark datasets (MNIST, CIFAR-10, and CIFAR-100) demonstrate that OPUS-VFL significantly outperforms state-of-the-art VFL baselines in both efficiency and robustness. It reduces label inference attack success rates by up to 20%, increases feature inference reconstruction error (MSE) by over 30%, and achieves up to 25% higher incentives for clients that contribute meaningfully while respecting privacy and cost constraints. These results highlight the practicality and innovation of OPUS-VFL as a secure, fair, and performance-driven solution for real-world VFL.

arxiv情報

著者	Sindhuja Madabushi,Ahmad Faraz Khan,Haider Ali,Jin-Hee Cho
発行日	2025-04-22 16:00:11+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.LG | コメントを受け付けていません

How Private is Your Attention? Bridging Privacy with In-Context Learning

投稿日: 2025年4月23日作成者: jarxiv

要約

コンテキスト学習（ICL） – 推論時間HASで提供された例から新しいタスクを実行するトランスベースのモデルの能力は、現代言語モデルの特徴として浮上しました。
最近の作品はICLの根底にあるメカニズムを調査していますが、正式なプライバシーの制約に基づくその実現可能性はほとんど未踏のままです。
このホワイトペーパーでは、線形注意ヘッドのための差別的に私的なプリグルトレーニングアルゴリズムを提案し、線形回帰におけるICLのプライバシーaccuuracyトレードオフの最初の理論分析を提示します。
我々の結果は、最適化とプライバシー誘発ノイズの間の根本的な緊張を特徴づけ、繰り返しの方法を介してプライベートトレーニングで観察された行動を正式にキャプチャします。
さらに、標準の尾根回帰とは異なり、トレーニングプロンプトの敵対的な摂動に対して、この方法が堅牢であることを示します。
すべての理論的調査結果は、多様な設定にわたる広範なシミュレーションによってサポートされています。

要約(オリジナル)

In-context learning (ICL)-the ability of transformer-based models to perform new tasks from examples provided at inference time-has emerged as a hallmark of modern language models. While recent works have investigated the mechanisms underlying ICL, its feasibility under formal privacy constraints remains largely unexplored. In this paper, we propose a differentially private pretraining algorithm for linear attention heads and present the first theoretical analysis of the privacy-accuracy trade-off for ICL in linear regression. Our results characterize the fundamental tension between optimization and privacy-induced noise, formally capturing behaviors observed in private training via iterative methods. Additionally, we show that our method is robust to adversarial perturbations of training prompts, unlike standard ridge regression. All theoretical findings are supported by extensive simulations across diverse settings.

arxiv情報

著者	Soham Bonnerjee,Zhen Wei,Yeon,Anna Asch,Sagnik Nandy,Promit Ghosal
発行日	2025-04-22 16:05:26+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL, cs.CR, cs.LG, stat.ML | コメントを受け付けていません

CAPO: Cost-Aware Prompt Optimization

投稿日: 2025年4月23日作成者: jarxiv

要約

大規模な言語モデル（LLM）は、単にプロンプトに導かれる幅広いタスクを解決することにより、自然言語処理に革命をもたらしました。
しかし、彼らのパフォーマンスは迅速な策定に非常に敏感です。
自動化されたプロンプト最適化は、最適なプロンプトを見つけることによりこの課題に対処しますが、現在の方法ではかなりの数のLLMコールと入力トークンが必要であり、プロンプトの最適化が高価になります。
Capo（コスト認識の迅速な最適化）を紹介します。これは、Automl技術を統合することで迅速な最適化効率を高めるアルゴリズムです。
Capoは、LLMSをオペレーターとしての進化的アプローチであり、評価と多目的最適化を節約するためのレースを組み込み、パフォーマンスと迅速な長さのバランスをとります。
堅牢性を向上させるためにタスクの説明を活用しながら、指示と少数のショット例を共同で最適化します。
多様なデータセットとLLMSにわたる広範な実験は、Capoが11/15のケースで最先端の離散プロンプト最適化方法を上回ることを示しています。
私たちのアルゴリズムは、予算が少ない既により良いパフォーマンスを達成し、レースを通じて評価を節約し、長さのペナルティを介して平均プロンプトの長さを減らし、費用効率とコスト認識の両方にします。
少数のショットの例がなくても、Capoは競合他社よりも優れており、一般的に初期プロンプトに対して堅牢なままです。
Capoは、コスト効率を向上させることにより、迅速な最適化をより強力でアクセスしやすくするための重要なステップを表しています。

要約(オリジナル)

Large language models (LLMs) have revolutionized natural language processing by solving a wide range of tasks simply guided by a prompt. Yet their performance is highly sensitive to prompt formulation. While automated prompt optimization addresses this challenge by finding optimal prompts, current methods require a substantial number of LLM calls and input tokens, making prompt optimization expensive. We introduce CAPO (Cost-Aware Prompt Optimization), an algorithm that enhances prompt optimization efficiency by integrating AutoML techniques. CAPO is an evolutionary approach with LLMs as operators, incorporating racing to save evaluations and multi-objective optimization to balance performance with prompt length. It jointly optimizes instructions and few-shot examples while leveraging task descriptions for improved robustness. Our extensive experiments across diverse datasets and LLMs demonstrate that CAPO outperforms state-of-the-art discrete prompt optimization methods in 11/15 cases with improvements up to 21%p. Our algorithm achieves better performances already with smaller budgets, saves evaluations through racing, and decreases average prompt length via a length penalty, making it both cost-efficient and cost-aware. Even without few-shot examples, CAPO outperforms its competitors and generally remains robust to initial prompts. CAPO represents an important step toward making prompt optimization more powerful and accessible by improving cost-efficiency.

arxiv情報

著者	Tom Zehle,Moritz Schlager,Timo Heiß,Matthias Feurer
発行日	2025-04-22 16:14:31+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL, cs.NE, stat.ML | コメントを受け付けていません

AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models

投稿日: 2025年4月23日作成者: jarxiv

要約

大規模な言語モデル（LLM）は、しばしば誤った知識または時代遅れの知識のために幻覚を示します。
したがって、ターゲットを絞った知識の更新を可能にするために、モデル編集方法が登場しました。
これを達成するために、一般的なパラダイムは、最初に影響力のあるパラメーターを見つけてから、摂動を導入してそれらを編集する場所である編集アプローチです。
効果的ですが、現在の研究により、この摂動は、特に連続した編集シナリオで、LLM内の元々保存されていた知識を必然的に混乱させることが実証されています。
これに対処するために、パラメーターに適用する前に、保存された知識のヌル空間に摂動を投影する新しいソリューションであるAlphaeditを紹介します。
この投影により、編集後のLLMの出力が保存された知識について照会された場合、混乱の問題を軽減すると、変更されたLLMの出力が変わらないことを理論的に証明します。
LLAMA3、GPT2-XL、およびGPT-Jを含むさまざまなLLMに関する広範な実験は、Alphaeditが投影のみの追加コードを1行で追加コードで、最も位置付けた編集方法のパフォーマンスを平均36.7％増加させることを示しています。
私たちのコードは、https：//github.com/jianghoucheng/alphaeditで入手できます。

要約(オリジナル)

Large language models (LLMs) often exhibit hallucinations due to incorrect or outdated knowledge. Hence, model editing methods have emerged to enable targeted knowledge updates. To achieve this, a prevailing paradigm is the locating-then-editing approach, which first locates influential parameters and then edits them by introducing a perturbation. While effective, current studies have demonstrated that this perturbation inevitably disrupt the originally preserved knowledge within LLMs, especially in sequential editing scenarios. To address this, we introduce AlphaEdit, a novel solution that projects perturbation onto the null space of the preserved knowledge before applying it to the parameters. We theoretically prove that this projection ensures the output of post-edited LLMs remains unchanged when queried about the preserved knowledge, thereby mitigating the issue of disruption. Extensive experiments on various LLMs, including LLaMA3, GPT2-XL, and GPT-J, show that AlphaEdit boosts the performance of most locating-then-editing methods by an average of 36.7% with a single line of additional code for projection solely. Our code is available at: https://github.com/jianghoucheng/AlphaEdit.

arxiv情報

著者	Junfeng Fang,Houcheng Jiang,Kun Wang,Yunshan Ma,Shi Jie,Xiang Wang,Xiangnan He,Tat-seng Chua
発行日	2025-04-22 16:15:47+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL | コメントを受け付けていません

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント