jarxiv | Japanese arxiv | ページ 331

Ontology- and LLM-based Data Harmonization for Federated Learning in Healthcare

投稿日: 2025年5月27日作成者: jarxiv

要約

電子健康記録（EHRS）の台頭は、医学研究の新しい機会を解き放ちましたが、プライバシー規制とデータの不均一性は、大規模な機械学習に対する重要な障壁のままです。
Federated Learning（FL）は、生データを共有せずに共同モデリングを可能にしますが、多様な臨床データセットの調和に課題に直面しています。
このホワイトペーパーでは、オントロジーと大規模な言語モデル（LLM）を統合する2段階のデータアラインメント戦略を提示して、ヘルスケアの安全なプライバシーを提供するFLをサポートし、EHRデータのセマンティックマッピングを含む現実世界のプロジェクトでの有効性を実証します。

要約(オリジナル)

The rise of electronic health records (EHRs) has unlocked new opportunities for medical research, but privacy regulations and data heterogeneity remain key barriers to large-scale machine learning. Federated learning (FL) enables collaborative modeling without sharing raw data, yet faces challenges in harmonizing diverse clinical datasets. This paper presents a two-step data alignment strategy integrating ontologies and large language models (LLMs) to support secure, privacy-preserving FL in healthcare, demonstrating its effectiveness in a real-world project involving semantic mapping of EHR data.

arxiv情報

著者	Natallia Kokash,Lei Wang,Thomas H. Gillespie,Adam Belloum,Paola Grosso,Sara Quinney,Lang Li,Bernard de Bono
発行日	2025-05-26 14:09:17+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.LG, cs.SE | コメントを受け付けていません

Graph Wave Networks

投稿日: 2025年5月27日作成者: jarxiv

要約

ダイナミクスモデリングは、グラフニューラルネットワーク（GNNS）のメッセージパッシング（MP）の新しいパラダイムとして導入されています。
既存の方法は、ノード間のMPを熱拡散プロセスとして考慮し、熱方程式を活用して、埋め込み空間内のノードの時間的進化をモデル化します。
ただし、熱方程式は、グラフ信号処理におけるグラフ信号の波の性質をほとんど表すことはできません。
それに加えて、熱方程式は本質的に、数値の溶液が安定性が低く、非効率的なモデルトレーニングにつながる時間の最初の部分誘導体を含む部分微分方程式（PDE）です。
この論文では、グラフ信号は本質的に固有ベクトルの形で一連の波の重ね合わせと見なすことができる波線信号であるため、MPのより多くの波の詳細を描写したいと思います。
これにより、MPを空間内の波シグナルの時間的進化をキャプチャする波動伝播プロセスと見なすようになります。
物理学の波方程式に基づいて、グラフ波方程式を革新的に開発して、グラフの波の伝播を活用します。
詳細には、グラフ波方程式が従来のスペクトルGNNに接続され、さまざまなラプラシアンに基づいてグラフ波ネットワークの設計を促進し、スペクトルGNNの性能を向上させることができることを示します。
さらに、グラフ波方程式は、特に時間の2番目の部分誘導体を含むPDEであり、これは、最初の部分微分を含む熱方程式よりもグラフの安定性が強い。
さらに、グラフ波方程式から導出された数値解が常に安定していることを理論的に証明し、そのパフォーマンスを確保しながらモデル効率を大幅に向上させることができます。
広範な実験では、GWNがベンチマークデータセットでSOTAと効率的なパフォーマンスを達成し、過剰な滑走やヘテロフィリーなどの挑戦的なグラフの問題に対処する際に優れたパフォーマンスを示すことが示されています。

要約(オリジナル)

Dynamics modeling has been introduced as a novel paradigm in message passing (MP) of graph neural networks (GNNs). Existing methods consider MP between nodes as a heat diffusion process, and leverage heat equation to model the temporal evolution of nodes in the embedding space. However, heat equation can hardly depict the wave nature of graph signals in graph signal processing. Besides, heat equation is essentially a partial differential equation (PDE) involving a first partial derivative of time, whose numerical solution usually has low stability, and leads to inefficient model training. In this paper, we would like to depict more wave details in MP, since graph signals are essentially wave signals that can be seen as a superposition of a series of waves in the form of eigenvector. This motivates us to consider MP as a wave propagation process to capture the temporal evolution of wave signals in the space. Based on wave equation in physics, we innovatively develop a graph wave equation to leverage the wave propagation on graphs. In details, we demonstrate that the graph wave equation can be connected to traditional spectral GNNs, facilitating the design of graph wave networks based on various Laplacians and enhancing the performance of the spectral GNNs. Besides, the graph wave equation is particularly a PDE involving a second partial derivative of time, which has stronger stability on graphs than the heat equation that involves a first partial derivative of time. Additionally, we theoretically prove that the numerical solution derived from the graph wave equation are constantly stable, enabling to significantly enhance model efficiency while ensuring its performance. Extensive experiments show that GWNs achieve SOTA and efficient performance on benchmark datasets, and exhibit outstanding performance in addressing challenging graph problems, such as over-smoothing and heterophily.

arxiv情報

著者	Juwei Yue,Haikuo Li,Jiawei Sheng,Yihan Guo,Xinghua Zhang,Chuan Zhou,Tingwen Liu,Li Guo
発行日	2025-05-26 14:20:41+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.LG | コメントを受け付けていません

Beyond Simple Concatenation: Fairly Assessing PLM Architectures for Multi-Chain Protein-Protein Interactions Prediction

投稿日: 2025年5月27日作成者: jarxiv

要約

タンパク質間相互作用（PPI）は多くの細胞プロセスの基本であり、その特性は疾患メカニズムを理解し、創薬を導くために不可欠です。
タンパク質言語モデル（PLMS）は、タンパク質の構造と機能の予測に顕著な成功を示していますが、シーケンスベースのPPI結合親和性予測への応用は比較的目立たないままです。
このギャップは、多くの場合、高品質で厳密に洗練されたデータセットの希少性と、タンパク質表現を連結するための単純な戦略への依存に起因します。
この作業では、これらの制限に対処します。
まず、マルチチェーンタンパク質相互作用のための注釈の一貫性と重複エントリを解決することにより、合計8,207個のユニークなタンパク質相互作用エントリのPPBアフィニティデータセットの細心の注意を払ってキュレーションされたバージョンを紹介します。
このデータセットには、30％以下の厳格なシーケンスアイデンティティのしきい値が組み込まれており、トレーニング、検証、およびテストセットに堅牢な分割を確保し、データの漏れを最小限に抑えます。
第二に、PLMSをPPI結合親和性予測に適応させるための4つのアーキテクチャを提案し、体系的に評価します：埋め込み連結（EC）、シーケンス連結（SC）、階層プーリング（HP）、およびプールされた注意追加（PAD）。
これらのアーキテクチャは、2つのトレーニング方法を使用して評価されました。完全な微調整と、凍結したPLM機能を使用するコンバートヘッドを使用する軽量アプローチです。
複数の主要なPLMS（PROTT5、ESM2、ANKH、ANKH2、およびESM3）にわたる包括的な実験は、HPおよびPADアーキテクチャが一貫して従来の連結方法よりも優れており、スピアマン相関に関して最大12％増加することを実証しました。
これらの結果は、微妙なPPI結合親和性予測のためにPLMSの機能を完全に活用するための洗練された建築設計の必要性を強調しています。

要約(オリジナル)

Protein-protein interactions (PPIs) are fundamental to numerous cellular processes, and their characterization is vital for understanding disease mechanisms and guiding drug discovery. While protein language models (PLMs) have demonstrated remarkable success in predicting protein structure and function, their application to sequence-based PPI binding affinity prediction remains relatively underexplored. This gap is often attributed to the scarcity of high-quality, rigorously refined datasets and the reliance on simple strategies for concatenating protein representations. In this work, we address these limitations. First, we introduce a meticulously curated version of the PPB-Affinity dataset of a total of 8,207 unique protein-protein interaction entries, by resolving annotation inconsistencies and duplicate entries for multi-chain protein interactions. This dataset incorporates a stringent, less than or equal to 30%, sequence identity threshold to ensure robust splitting into training, validation, and test sets, minimizing data leakage. Second, we propose and systematically evaluate four architectures for adapting PLMs to PPI binding affinity prediction: embeddings concatenation (EC), sequences concatenation (SC), hierarchical pooling (HP), and pooled attention addition (PAD). These architectures were assessed using two training methods: full fine-tuning and a lightweight approach employing ConvBERT heads over frozen PLM features. Our comprehensive experiments across multiple leading PLMs (ProtT5, ESM2, Ankh, Ankh2, and ESM3) demonstrated that the HP and PAD architectures consistently outperform conventional concatenation methods, achieving up to 12% increase in terms of Spearman correlation. These results highlight the necessity of sophisticated architectural designs to fully exploit the capabilities of PLMs for nuanced PPI binding affinity prediction.

arxiv情報

著者	Hazem Alsamkary,Mohamed Elshaffei,Mohamed Soudy,Sara Ossman,Abdallah Amr,Nehal Adel Abdelsalam,Mohamed Elkerdawy,Ahmed Elnaggar
発行日	2025-05-26 14:23:08+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.LG, q-bio.BM | コメントを受け付けていません

Controlling Neural Collapse Enhances Out-of-Distribution Detection and Transfer Learning

投稿日: 2025年5月27日作成者: jarxiv

要約

分散除外（OOD）検出とOOD一般化は、深いニューラルネットワーク（DNNS）で広く研究されていますが、その関係はよく理解されていません。
ネットワーク層の神経崩壊（NC）の程度がこれらの目的と反比例していることを経験的に示します。NCが強力なNCはOOD検出を改善しますが、一般化を分解しますが、NCは検出コストで一般化を強化します。
このトレードオフは、単一の機能空間が両方のタスクを同時に達成できないことを示唆しています。
これに対処するために、NCをOOD検出と一般化にリンクする理論的フレームワークを開発します。
エントロピーの正則化により、NCが一般化を改善するためにNCを緩和し、固定シンプレックス等角タイトフレーム（ETF）プロジェクターがNCを強化するためにNCを強化することを示します。
これらの洞察に基づいて、さまざまなDNN層でNCを制御する方法を提案します。
実験では、私たちの方法は、OODデータセットとDNNアーキテクチャの両方のタスクで優れています。

要約(オリジナル)

Out-of-distribution (OOD) detection and OOD generalization are widely studied in Deep Neural Networks (DNNs), yet their relationship remains poorly understood. We empirically show that the degree of Neural Collapse (NC) in a network layer is inversely related with these objectives: stronger NC improves OOD detection but degrades generalization, while weaker NC enhances generalization at the cost of detection. This trade-off suggests that a single feature space cannot simultaneously achieve both tasks. To address this, we develop a theoretical framework linking NC to OOD detection and generalization. We show that entropy regularization mitigates NC to improve generalization, while a fixed Simplex Equiangular Tight Frame (ETF) projector enforces NC for better detection. Based on these insights, we propose a method to control NC at different DNN layers. In experiments, our method excels at both tasks across OOD datasets and DNN architectures.

arxiv情報

著者	Md Yousuf Harun,Jhair Gallardo,Christopher Kanan
発行日	2025-05-26 14:24:51+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.LG | コメントを受け付けていません

Synthetic Time Series Forecasting with Transformer Architectures: Extensive Simulation Benchmarks

投稿日: 2025年5月27日作成者: jarxiv

要約

時系列予測は、エネルギー、金融、ヘルスケアなどのドメインで重要な役割を果たします。
トランスベースのモデルは順次モデリングで成功を示していますが、時系列の採用は、ノイズ感度、長距離依存関係、時間構造に対する誘導バイアスの欠如などの課題によって制限されたままです。
この作業では、3つの著名な変圧器予測アーキテクチャ、情報提供者、およびPatchtst-eachが3つのアーキテクチャバリエーションを通じて評価されるベンチマークのための統一された原則的なフレームワークを提示します。
私たちは、クリーン条件と騒々しい条件の両方で5つのパッチの長さと5つの予測視野にまたがる10個の合成信号スイートで1500を超える制御された実験を実施します。
私たちの分析は、モデルファミリ全体で一貫したパターンを明らかにしています。
このランドスケープをさらに進めるために、オペレーターと理論の潜在状態モデリングを統合して安定性と解釈性を向上させるKoopmanが強化したトランスフレームワーク、Deep Koopformerを紹介します。
非線形および混oticとした動的システムでの有効性を示します。
私たちの結果は、Koopmanベースの変圧器を、ノイズの多い複雑な現実世界条件での堅牢で解釈可能な、理論的に根拠のある時系列予測の有望なハイブリッドアプローチとして強調しています。

要約(オリジナル)

Time series forecasting plays a critical role in domains such as energy, finance, and healthcare, where accurate predictions inform decision-making under uncertainty. Although Transformer-based models have demonstrated success in sequential modeling, their adoption for time series remains limited by challenges such as noise sensitivity, long-range dependencies, and a lack of inductive bias for temporal structure. In this work, we present a unified and principled framework for benchmarking three prominent Transformer forecasting architectures-Autoformer, Informer, and Patchtst-each evaluated through three architectural variants: Minimal, Standard, and Full, representing increasing levels of complexity and modeling capacity. We conduct over 1500 controlled experiments on a suite of ten synthetic signals, spanning five patch lengths and five forecast horizons under both clean and noisy conditions. Our analysis reveals consistent patterns across model families. To advance this landscape further, we introduce the Koopman-enhanced Transformer framework, Deep Koopformer, which integrates operator-theoretic latent state modeling to improve stability and interpretability. We demonstrate its efficacy on nonlinear and chaotic dynamical systems. Our results highlight Koopman based Transformer as a promising hybrid approach for robust, interpretable, and theoretically grounded time series forecasting in noisy and complex real-world conditions.

arxiv情報

著者	Ali Forootani,Mohammad Khosravi
発行日	2025-05-26 14:34:05+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.LG, cs.SY, eess.SY | コメントを受け付けていません

Catoni-Style Change Point Detection for Regret Minimization in Non-Stationary Heavy-Tailed Bandits

投稿日: 2025年5月27日作成者: jarxiv

要約

確率的非定常盗賊の後悔の最小化は、広告から推奨システムまで、広範なクラスの実際の問題をモデル化できるため、過去10年間で人気を博しました。
既存の文献は、ベルヌーリやサブガウスの報酬など、報酬を生成するプロセスに関するさまざまな仮定に依存しています。
ただし、金融や通信などの設定では、重度の尾のある分布が自然に発生します。
この作業では、重い尾のある区分的な盗賊問題に取り組んでいます。
Bubeck et al。、2013によって導入された重尾の盗賊は、最大注文$ 1+\ epsilon $の有限絶対中心モーメントが一定の$ v <+\ infty $によって均一に境界を付けているという最小限の仮定で動作します。報酬生成分布の平均は、未知の時間ステップで変化する可能性があります。最後に、合成および実世界のデータセットに関する数値実験を通じてアプローチを検証します。

要約(オリジナル)

Regret minimization in stochastic non-stationary bandits gained popularity over the last decade, as it can model a broad class of real-world problems, from advertising to recommendation systems. Existing literature relies on various assumptions about the reward-generating process, such as Bernoulli or subgaussian rewards. However, in settings such as finance and telecommunications, heavy-tailed distributions naturally arise. In this work, we tackle the heavy-tailed piecewise-stationary bandit problem. Heavy-tailed bandits, introduced by Bubeck et al., 2013, operate on the minimal assumption that the finite absolute centered moments of maximum order $1+\epsilon$ are uniformly bounded by a constant $v<+\infty$, for some $\epsilon \in (0,1]$. We focus on the most popular non-stationary bandit setting, i.e., the piecewise-stationary setting, in which the mean of reward-generating distributions may change at unknown time steps. We provide a novel Catoni-style change-point detection strategy tailored for heavy-tailed distributions that relies on recent advancements in the theory of sequential estimation, which is of independent interest. We introduce Robust-CPD-UCB, which combines this change-point detection strategy with optimistic algorithms for bandits, providing its regret upper bound and an impossibility result on the minimum attainable regret for any policy. Finally, we validate our approach through numerical experiments on synthetic and real-world datasets.

arxiv情報

著者	Gianmarco Genalti,Sujay Bhatt,Nicola Gatti,Alberto Maria Metelli
発行日	2025-05-26 14:40:47+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.LG | コメントを受け付けていません

Ankh3: Multi-Task Pretraining with Sequence Denoising and Completion Enhances Protein Representations

投稿日: 2025年5月27日作成者: jarxiv

要約

タンパク質モデル（PLM）は、タンパク質配列の複雑なパターンを検出するための強力なツールとして浮上しています。
ただし、タンパク質配列に関する情報を完全にキャプチャするPLMSの機能は、単一のトレーニング前のタスクに焦点を当てることで制限される場合があります。
データのモダリティまたは監視された目標を追加すると、PLMSのパフォーマンスが向上する可能性がありますが、トレーニング前のパフォーマンスはしばしば腐敗したシーケンスの除去に焦点を当てています。
PLMSの境界を押し広げるために、私たちの研究では、マルチタスク前のトレーニング戦略を調査しました。
ANKH3を開発しました。これは、2つの目的で共同で最適化されたモデルを開発しました。複数のマスキング確率を備えたマスク言語モデリングと、タンパク質シーケンスのみに依存するタンパク質シーケンスの完了です。
このマルチタスクの事前トレーニングは、PLMがタンパク質配列のみからより豊かで一般化可能な表現を学習できることを実証しました。
結果は、二次構造予測、蛍光、GB1フィットネス、接触予測など、下流タスクのパフォーマンスの向上を実証しました。
複数のタスクの統合により、モデルはタンパク質特性をより包括的に理解し、より堅牢で正確な予測につながりました。

要約(オリジナル)

Protein language models (PLMs) have emerged as powerful tools to detect complex patterns of protein sequences. However, the capability of PLMs to fully capture information on protein sequences might be limited by focusing on single pre-training tasks. Although adding data modalities or supervised objectives can improve the performance of PLMs, pre-training often remains focused on denoising corrupted sequences. To push the boundaries of PLMs, our research investigated a multi-task pre-training strategy. We developed Ankh3, a model jointly optimized on two objectives: masked language modeling with multiple masking probabilities and protein sequence completion relying only on protein sequences as input. This multi-task pre-training demonstrated that PLMs can learn richer and more generalizable representations solely from protein sequences. The results demonstrated improved performance in downstream tasks, such as secondary structure prediction, fluorescence, GB1 fitness, and contact prediction. The integration of multiple tasks gave the model a more comprehensive understanding of protein properties, leading to more robust and accurate predictions.

arxiv情報

著者	Hazem Alsamkary,Mohamed Elshaffei,Mohamed Elkerdawy,Ahmed Elnaggar
発行日	2025-05-26 14:41:10+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.LG, q-bio.QM | コメントを受け付けていません

MiniLongBench: The Low-cost Long Context Understanding Benchmark for Large Language Models

投稿日: 2025年5月27日作成者: jarxiv

要約

長いコンテキスト理解（LCU）は、現在の大手言語モデル（LLMS）の探索の重要な領域です。
ただし、ロングテキストデータの本質的に長い性質により、LLMの既存のLCUベンチマークは、テスト時間や推論費用など、非常に高い評価コストをもたらすことがよくあります。
広範な実験を通じて、既存のLCUベンチマークが有意な冗長性を示すことがわかります。これは、評価の非効率性を意味します。
このホワイトペーパーでは、まばらな情報特性を備えたロングテキストデータに合わせた簡潔なデータ圧縮法を提案します。
よく知られているLCUベンチマークロングベンチを剪定することで、Minilongbenchを作成します。
このベンチマークには、6つの主要なタスクカテゴリと21の異なるタスクにわたる237のテストサンプルのみが含まれます。
60を超えるLLMの経験的分析により、Minilongbenchは平均評価コストを元の4.5％にわずか4.5％に削減しながら、ロングベンチの結果で平均ランク相関係数を0.97に維持します。
したがって、私たちのMinilongbenchは、低コストのベンチマークとして、LLMSのLCU能力に関する将来の研究を実質的に推進する大きな可能性を秘めています。
コード、データ、チュートリアルについては、https：//github.com/milkthink-lab/minilongbenchを参照してください。

要約(オリジナル)

Long Context Understanding (LCU) is a critical area for exploration in current large language models (LLMs). However, due to the inherently lengthy nature of long-text data, existing LCU benchmarks for LLMs often result in prohibitively high evaluation costs, like testing time and inference expenses. Through extensive experimentation, we discover that existing LCU benchmarks exhibit significant redundancy, which means the inefficiency in evaluation. In this paper, we propose a concise data compression method tailored for long-text data with sparse information characteristics. By pruning the well-known LCU benchmark LongBench, we create MiniLongBench. This benchmark includes only 237 test samples across six major task categories and 21 distinct tasks. Through empirical analysis of over 60 LLMs, MiniLongBench achieves an average evaluation cost reduced to only 4.5% of the original while maintaining an average rank correlation coefficient of 0.97 with LongBench results. Therefore, our MiniLongBench, as a low-cost benchmark, holds great potential to substantially drive future research into the LCU capabilities of LLMs. See https://github.com/MilkThink-Lab/MiniLongBench for our code, data and tutorial.

arxiv情報

著者	Zhongzhan Huang,Guoming Ling,Shanshan Zhong,Hefeng Wu,Liang Lin
発行日	2025-05-26 13:21:18+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CL | コメントを受け付けていません

Explanatory Summarization with Discourse-Driven Planning

投稿日: 2025年5月27日作成者: jarxiv

要約

科学文書の概要の概要は、通常、読者が洗練された概念や議論を把握するのを助けるための説明を含みます。
ただし、現在の自動要約方法では、説明を明示的にモデル化するものではなく、説明コンテンツの割合を人間書かれた要約に合わせることを困難にします。
この論文では、談話のフレームワークを活用して要約生成を整理し、計画への回答を促すことにより説明文を導く計画ベースのアプローチを提示します。
具体的には、2つの談話主導の計画戦略を提案します。この戦略では、それぞれ計画が出力プレフィックスの一部または一部として条件付けられています。
3つのレイの要約データセットでの経験的実験は、私たちのアプローチが要約品質の観点から既存の最先端の方法よりも優れていることを示しており、モデルの堅牢性、制御性、幻覚を軽減します。

要約(オリジナル)

Lay summaries for scientific documents typically include explanations to help readers grasp sophisticated concepts or arguments. However, current automatic summarization methods do not explicitly model explanations, which makes it difficult to align the proportion of explanatory content with human-written summaries. In this paper, we present a plan-based approach that leverages discourse frameworks to organize summary generation and guide explanatory sentences by prompting responses to the plan. Specifically, we propose two discourse-driven planning strategies, where the plan is conditioned as part of the input or part of the output prefix, respectively. Empirical experiments on three lay summarization datasets show that our approach outperforms existing state-of-the-art methods in terms of summary quality, and it enhances model robustness, controllability, and mitigates hallucination.

arxiv情報

著者	Dongqi Liu,Xi Yu,Vera Demberg,Mirella Lapata
発行日	2025-05-26 13:22:33+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL, cs.LG | コメントを受け付けていません

The Limits of Preference Data for Post-Training

投稿日: 2025年5月27日作成者: jarxiv

要約

大規模な言語モデルの能力を強化する最近の進歩は、自動的に検証可能な結果を持つドメインに強化学習を適用することから生じています。
重要な質問は、結果を評価するには人間のフィードバックが必要なドメインの結果に対して同様にRLを使用できるかどうかです。
たとえば、深い研究や旅行の計画などのタスクでは、結果の評価が定性的であり、成功の程度がたくさんあります。
人間のフィードバックを収集するための魅力的でスケーラブルなモダリティの1つは、優先データです。$ k $が指定された結果を示す順序ランキング（ペアワイズまたは$ k $ -wise）が望ましいものです。
この作業では、重要な障害を研究します。優先データは、結果に基づいた最適化を根本的かつ大幅に制限します。
理想化された選好データ（無限、ノイズレス、オンライン）であっても、順序フィードバックを使用すると、ほぼ最適なソリューションを取得することができません。
投票理論を使用してこの不可能性を正式にし、モデルがどのように選出するかについてのクエリに答えることを選択する方法との類似性を引き出します。
これは、人間のフィードバックを要求するドメインへのトレーニング後のRLの成功を拡大するために、接地された人間のスコアリングとアルゴリズムの革新が必要であることを示しています。
また、RLHFが歴史的に成功した状況（例えば、命令調整や安全トレーニング）を引き出すことに関して、これらの制限がRLHFに不釣り合いに影響を与えた理由を調査します。

要約(オリジナル)

Recent progress in strengthening the capabilities of large language models has stemmed from applying reinforcement learning to domains with automatically verifiable outcomes. A key question is whether we can similarly use RL to optimize for outcomes in domains where evaluating outcomes inherently requires human feedback; for example, in tasks like deep research and trip planning, outcome evaluation is qualitative and there are many possible degrees of success. One attractive and scalable modality for collecting human feedback is preference data: ordinal rankings (pairwise or $k$-wise) that indicate, for $k$ given outcomes, which one is preferred. In this work, we study a critical roadblock: preference data fundamentally and significantly limits outcome-based optimization. Even with idealized preference data (infinite, noiseless, and online), the use of ordinal feedback can prevent obtaining even approximately optimal solutions. We formalize this impossibility using voting theory, drawing an analogy between how a model chooses to answer a query with how voters choose a candidate to elect. This indicates that grounded human scoring and algorithmic innovations are necessary for extending the success of RL post-training to domains demanding human feedback. We also explore why these limitations have disproportionately impacted RLHF when it comes to eliciting reasoning behaviors (e.g., backtracking) versus situations where RLHF has been historically successful (e.g., instruction-tuning and safety training), finding that the limitations of preference data primarily suppress RLHF’s ability to elicit robust strategies — a class that encompasses most reasoning behaviors.

arxiv情報

著者	Eric Zhao,Jessica Dai,Pranjal Awasthi
発行日	2025-05-26 13:26:15+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL, cs.GT, cs.LG | コメントを受け付けていません

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント