jarxiv | Japanese arxiv | ページ 319

Scaling External Knowledge Input Beyond Context Windows of LLMs via Multi-Agent Collaboration

投稿日: 2025年5月28日作成者: jarxiv

要約

推論後のテクニックの急速な進歩と推論と情報探索のために、大規模な言語モデル（LLM）は、複雑なタスクを解決するために大量の検索された知識を組み込むことができます。
ただし、LLMSの限られたコンテキストウィンドウは、特にかなりの量の外部知識を必要とするタスクの場合、外部の知識入力の量をスケーリングして、さらなる改善を禁止します。
既存のコンテキストウィンドウ拡張メソッドは、必然的に情報の損失を引き起こします。
LLMベースのマルチエージェントメソッドは、既存の知識の同期と推論プロセスで2つのコアボトルネックを特定する分布方法で大規模な入力を処理する新しいパラダイムとして登場します。
この作業では、ボトルネックを克服し、長いコンテキストトレーニングなしで推論時間統合のより良いスケーラビリティを可能にするために、マルチエージェントフレームワーク$ \ textBf {extagents} $を開発します。
強化されたマルチホップ質問回答テスト、$ \ textBf {$ \ boldsymbol {\ infty} $ bench+} $、および長い調査生成を含む他のパブリックテストセットでベンチマークされているため、内外では、$ \ textの入力に関係なく、同じ量の外部知識入力で既存の非トレーニング方法のパフォーマンスを大幅に向上させます。
さらに、この方法は、並列性が高いため、高い効率を維持します。
外部知識入力の増加に関するLLMエージェントの調整に関するさらなる研究は、実際のアプリケーションに利益をもたらす可能性があります。

要約(オリジナル)

With the rapid advancement of post-training techniques for reasoning and information seeking, large language models (LLMs) can incorporate a large quantity of retrieved knowledge to solve complex tasks. However, the limited context window of LLMs obstructs scaling the amount of external knowledge input, prohibiting further improvement, especially for tasks requiring significant amount of external knowledge. Existing context window extension methods inevitably cause information loss. LLM-based multi-agent methods emerge as a new paradigm to handle massive input in a distributional manner, where we identify two core bottlenecks in existing knowledge synchronization and reasoning processes. In this work, we develop a multi-agent framework, $\textbf{ExtAgents}$, to overcome the bottlenecks and enable better scalability in inference-time knowledge integration without longer-context training. Benchmarked with our enhanced multi-hop question answering test, $\textbf{$\boldsymbol{\infty}$Bench+}$, and other public test sets including long survey generation, ExtAgents significantly enhances the performance over existing non-training methods with the same amount of external knowledge input, regardless of whether it falls $\textit{within or exceeds the context window}$. Moreover, the method maintains high efficiency due to high parallelism. Further study in the coordination of LLM agents on increasing external knowledge input could benefit real-world applications.

arxiv情報

著者	Zijun Liu,Zhennan Wan,Peng Li,Ming Yan,Ji Zhang,Fei Huang,Yang Liu
発行日	2025-05-27 17:45:04+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CL | コメントを受け付けていません

Are Language Models Consequentialist or Deontological Moral Reasoners?

投稿日: 2025年5月28日作成者: jarxiv

要約

AIシステムがヘルスケア、法律、ガバナンスのアプリケーションをますますナビゲートするにつれて、倫理的に複雑なシナリオをどのように処理するかを理解することが重要になります。
以前の研究では、主に、根本的な道徳的推論プロセスではなく、大規模な言語モデル（LLM）の道徳的判断を検討してきました。
対照的に、LLMSが提供する道徳的推論の痕跡の大規模な分析に焦点を当てています。
さらに、少数の道徳的ジレンマだけから推論を引き出そうとした以前の研究とは異なり、私たちの研究は、異なるLLM内で出現する推論パターンを明らかにするプローブとして600以上の異なるトロリーの問題を活用しています。
道徳的根拠の分類法を導入してテストして、2つの主要な規範的倫理理論、結果主義とデントロジーに従って推論の痕跡を体系的に分類します。
私たちの分析は、LLMの鎖の鎖が道徳的義務に基づいてデントロジカルの原則を支持する傾向があることを明らかにし、事後の説明は、特に有用性を強調する結果主義的な理論的根拠に顕著に移行します。
私たちのフレームワークは、LLMSがどのように倫理的考慮事項をプロセスし、明確にするかを理解するための基盤を提供します。これは、ハイステークスの意思決定環境におけるLLMの安全で解釈可能な展開に向けた重要なステップです。
私たちのコードは、https：//github.com/keenansamway/moral-lensで入手できます。

要約(オリジナル)

As AI systems increasingly navigate applications in healthcare, law, and governance, understanding how they handle ethically complex scenarios becomes critical. Previous work has mainly examined the moral judgments in large language models (LLMs), rather than their underlying moral reasoning process. In contrast, we focus on a large-scale analysis of the moral reasoning traces provided by LLMs. Furthermore, unlike prior work that attempted to draw inferences from only a handful of moral dilemmas, our study leverages over 600 distinct trolley problems as probes for revealing the reasoning patterns that emerge within different LLMs. We introduce and test a taxonomy of moral rationales to systematically classify reasoning traces according to two main normative ethical theories: consequentialism and deontology. Our analysis reveals that LLM chains-of-thought tend to favor deontological principles based on moral obligations, while post-hoc explanations shift notably toward consequentialist rationales that emphasize utility. Our framework provides a foundation for understanding how LLMs process and articulate ethical considerations, an important step toward safe and interpretable deployment of LLMs in high-stakes decision-making environments. Our code is available at https://github.com/keenansamway/moral-lens .

arxiv情報

著者	Keenan Samway,Max Kleiman-Weiner,David Guzman Piedrahita,Rada Mihalcea,Bernhard Schölkopf,Zhijing Jin
発行日	2025-05-27 17:51:18+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CL | コメントを受け付けていません

Hardware-Efficient Attention for Fast Decoding

投稿日: 2025年5月28日作成者: jarxiv

要約

LLMデコードは、大きなバッチと長いコンテキストのためにボトルネックされています。キー値（kV）キャッシュは、トークンあたりのレイテンシを膨らませ、デコードの連続的な性質は平行性を制限します。
算術強度、並列化、モデルの品質の相互作用を分析し、現在のアーキテクチャが最新のハードウェアを完全に活用するかどうかを疑問視します。
この作業は、並列スケーラビリティを取引せずにハードウェアの効率を最大化するために、メモリからロードされたバイトごとにより多くの計算を実行するように注意を再設計します。
最初に、キー状態と価値状態を組み合わせて再利用する単純なバリアントであるグループに結合した注意（GTA）を提案し、モデルの品質を損なうことなくメモリ転送を減らします。
次に、高いモデル品質を維持しながら、高速デコードのための低レベルの最適化と組み合わせた並行した潜在的な潜在的な注意であるグループ化された潜在的な注意（GLA）を紹介します。
実験では、GTAはグループ化されたクエリの注意（GQA）品質と一致しながらKVキャッシュの約半分を使用し、GLAはマルチヘッド潜在的注意（MLA）と一致し、シャードが容易であることが示されています。
たとえば、最適化されたGLAカーネルは、Flashmlaよりも最大2ドルの時間$ $速度です。たとえば、クエリの長さが1を超えると投機的なデコード設定で。
さらに、デバイスごとに小さなkVキャッシュを取得することにより、GLAはエンドツーエンドのレイテンシを減らし、オンラインサービングベンチマークのスループットを最大2 $ \ Times $だけ増加させます。

要約(オリジナル)

LLM decoding is bottlenecked for large batches and long contexts by loading the key-value (KV) cache from high-bandwidth memory, which inflates per-token latency, while the sequential nature of decoding limits parallelism. We analyze the interplay among arithmetic intensity, parallelization, and model quality and question whether current architectures fully exploit modern hardware. This work redesigns attention to perform more computation per byte loaded from memory to maximize hardware efficiency without trading off parallel scalability. We first propose Grouped-Tied Attention (GTA), a simple variant that combines and reuses key and value states, reducing memory transfers without compromising model quality. We then introduce Grouped Latent Attention (GLA), a parallel-friendly latent attention paired with low-level optimizations for fast decoding while maintaining high model quality. Experiments show that GTA matches Grouped-Query Attention (GQA) quality while using roughly half the KV cache and that GLA matches Multi-head Latent Attention (MLA) and is easier to shard. Our optimized GLA kernel is up to 2$\times$ faster than FlashMLA, for example, in a speculative decoding setting when the query length exceeds one. Furthermore, by fetching a smaller KV cache per device, GLA reduces end-to-end latency and increases throughput in online serving benchmarks by up to 2$\times$.

arxiv情報

著者	Ted Zadouri,Hubert Strauss,Tri Dao
発行日	2025-05-27 17:54:07+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CL, cs.LG | コメントを受け付けていません

Reinforcing General Reasoning without Verifiers

投稿日: 2025年5月28日作成者: jarxiv

要約

最近のパラダイムは、検証可能な報酬に関するDeepSeek-R1-Zeroスタイルの強化学習（RL）を使用して、大規模な言語モデル（LLMS）のトレーニングに移行しました。これは、コードと数学的推論の印象的な進歩をもたらしました。
ただし、この方法論は、ルールベースの回答検証が可能であり、化学、ヘルスケア、エンジニアリング、法律、生物学、ビジネス、経済学などの現実世界ドメインに自然に拡張されていないタスクに限定されています。
現在の実用的な回避策は、モデルベースの検証剤として追加のLLMを使用しています。
ただし、これにより、強力な検証剤LLMへの依存、ハッキングに報酬を与える可能性があり、トレーニング中に検証装置モデルをメモリに維持するという実際的な負担などの問題が発生します。
これに対処し、DeepSeek-R1-Zeroスタイルのトレーニングを一般的な推論ドメインに拡張するために、検証に回答し、代わりにRLを使用して参照回答を生成する確率を直接最大化する検証剤フリーメソッド（Verifree）を提案します。
Verifreeを検証剤ベースの方法と比較し、MMLU-Pro、GPQA、SuperGPQA、および数学関連のベンチマークを横断する広範な評価で、その重要な実用的な利点と計算要件の削減に加えて、検証剤ベースの方法を超えることを実証します。
さらに、この方法についての洞察を複数の観点から提供します。統一されたモデルでのポリシーと暗黙の検証者の両方をトレーニングするエレガントな統合として、および変動最適化アプローチとして。
コードはhttps://github.com/sail-sg/verifreeで入手できます。

要約(オリジナル)

The recent paradigm shift towards training large language models (LLMs) using DeepSeek-R1-Zero-style reinforcement learning (RL) on verifiable rewards has led to impressive advancements in code and mathematical reasoning. However, this methodology is limited to tasks where rule-based answer verification is possible and does not naturally extend to real-world domains such as chemistry, healthcare, engineering, law, biology, business, and economics. Current practical workarounds use an additional LLM as a model-based verifier; however, this introduces issues such as reliance on a strong verifier LLM, susceptibility to reward hacking, and the practical burden of maintaining the verifier model in memory during training. To address this and extend DeepSeek-R1-Zero-style training to general reasoning domains, we propose a verifier-free method (VeriFree) that bypasses answer verification and instead uses RL to directly maximize the probability of generating the reference answer. We compare VeriFree with verifier-based methods and demonstrate that, in addition to its significant practical benefits and reduced compute requirements, VeriFree matches and even surpasses verifier-based methods on extensive evaluations across MMLU-Pro, GPQA, SuperGPQA, and math-related benchmarks. Moreover, we provide insights into this method from multiple perspectives: as an elegant integration of training both the policy and implicit verifier in a unified model, and as a variational optimization approach. Code is available at https://github.com/sail-sg/VeriFree.

arxiv情報

著者	Xiangxin Zhou,Zichen Liu,Anya Sims,Haonan Wang,Tianyu Pang,Chongxuan Li,Liang Wang,Min Lin,Chao Du
発行日	2025-05-27 17:56:27+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CL, cs.LG | コメントを受け付けていません

Select2Reason: Efficient Instruction-Tuning Data Selection for Long-CoT Reasoning

投稿日: 2025年5月28日作成者: jarxiv

要約

事前に訓練された大規模な言語モデルにおける長い考え方の推論能力をアクティブにするための実用的なアプローチは、DeepSeek-R1などの強力な大規模な推論モデルによって合成された指導データセットで監視された微調整を実行し、強化学習に代わる費用対効果の高い代替品を提供することです。
ただし、100Kを超えるサンプルを備えた大規模な命令セットは、オーバーヘッドの大幅なトレーニングを受けますが、自動ロングコット命令選択のための効果的な戦略はまだ未開拓のままです。
この作業では、select2reasonを提案します。これは、ロングコットの推論のための斬新で効率的な命令調整データ選択フレームワークです。
自己修正やバックトラッキングなどの再考行動の出現の観点から、私たちは長期コットの推論指示の質を決定する可能性のある一般的なメトリックを調査します。
Select2Reasonは、質問の難易度を推定するために数量ファイアを活用し、共同で高効性の例に優先順位を付けるためのランキングのための加重スキームを通じて推論の長さベースのヒューリスティックを組み込んでいます。
OpenR1-Math-220Kの経験的結果は、Select2Reasonが選択したデータの10％のみで微調整LLMが、フルデータのチューニングとオープンソースのベースラインOpenL1-QWEN-7Bとのパフォーマンス競争力のあるパフォーマンスを達成していることを示しています。
さらなる実験では、さまざまなデータサイズ、推論中の効率、およびコストが最小限の他の命令プールへの適応性のスケーラビリティを強調しています。

要約(オリジナル)

A practical approach to activate long chain-of-thoughts reasoning ability in pre-trained large language models is to perform supervised fine-tuning on instruction datasets synthesized by strong Large Reasoning Models such as DeepSeek-R1, offering a cost-effective alternative to reinforcement learning. However, large-scale instruction sets with more than 100k samples incur significant training overhead, while effective strategies for automatic long-CoT instruction selection still remain unexplored. In this work, we propose Select2Reason, a novel and efficient instruction-tuning data selection framework for long-CoT reasoning. From the perspective of emergence of rethinking behaviors like self-correction and backtracking, we investigate common metrics that may determine the quality of long-CoT reasoning instructions. Select2Reason leverages a quantifier to estimate difficulty of question and jointly incorporates a reasoning trace length-based heuristic through a weighted scheme for ranking to prioritize high-utility examples. Empirical results on OpenR1-Math-220k demonstrate that fine-tuning LLM on only 10% of the data selected by Select2Reason achieves performance competitive with or superior to full-data tuning and open-source baseline OpenR1-Qwen-7B across three competition-level and six comprehensive mathematical benchmarks. Further experiments highlight the scalability in varying data size, efficiency during inference, and its adaptability to other instruction pools with minimal cost.

arxiv情報

著者	Cehao Yang,Xueyuan Lin,Chengjin Xu,Xuhui Jiang,Xiaojun Wu,Honghao Liu,Hui Xiong,Jian Guo
発行日	2025-05-27 15:50:50+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL | コメントを受け付けていません

Evaluating LLM Adaptation to Sociodemographic Factors: User Profile vs. Dialogue History

投稿日: 2025年5月28日作成者: jarxiv

要約

大規模な言語モデル（LLM）による効果的なエンゲージメントには、年齢、職業、教育レベルなどのユーザーの社会人口学的特性への応答を適応させる必要があります。
多くの現実世界のアプリケーションは、コンテキスト化のために対話履歴を活用していますが、LLMSの行動適応の既存の評価は、しばしば単一ターンプロンプトに焦点を当てています。
この論文では、属性が（1）プロンプトのユーザープロファイルを介して明示的に、または（2）マルチターンダイアログ履歴を介して暗黙的に導入された場合にLLM適応を評価するためのフレームワークを提案します。
これらのモダリティ全体でモデルの動作の一貫性を評価します。
マルチエージェントパイプラインを使用して、個別のユーザープロファイルを使用した合成データセットペアリングダイアログの履歴を構築し、価値調査モジュール（VSM 2013）（Hofstede and Hofstede、2016）から質問を採用して、値の表現をプローブします。
私たちの調査結果は、ほとんどのモデルが人口統計学的変化、特に年齢と教育レベルに応じて表現された値を調整することを示していますが、一貫性は異なります。
推論能力が強いモデルは、より大きな整合性を示しており、堅牢な社会人口学的適応における推論の重要性を示しています。

要約(オリジナル)

Effective engagement by large language models (LLMs) requires adapting responses to users’ sociodemographic characteristics, such as age, occupation, and education level. While many real-world applications leverage dialogue history for contextualization, existing evaluations of LLMs’ behavioral adaptation often focus on single-turn prompts. In this paper, we propose a framework to evaluate LLM adaptation when attributes are introduced either (1) explicitly via user profiles in the prompt or (2) implicitly through multi-turn dialogue history. We assess the consistency of model behavior across these modalities. Using a multi-agent pipeline, we construct a synthetic dataset pairing dialogue histories with distinct user profiles and employ questions from the Value Survey Module (VSM 2013) (Hofstede and Hofstede, 2016) to probe value expression. Our findings indicate that most models adjust their expressed values in response to demographic changes, particularly in age and education level, but consistency varies. Models with stronger reasoning capabilities demonstrate greater alignment, indicating the importance of reasoning in robust sociodemographic adaptation.

arxiv情報

著者	Qishuai Zhong,Zongmin Li,Siqi Fan,Aixin Sun
発行日	2025-05-27 15:52:39+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL, cs.HC | コメントを受け付けていません

Subgroups Matter for Robust Bias Mitigation

投稿日: 2025年5月28日作成者: jarxiv

要約

機械学習のための新しいバイアス緩和方法の絶え間ない開発にもかかわらず、一貫して成功する方法はなく、基本的な疑問は未回答のままです。バイアス緩和手法がいつ、なぜ失敗するのか？
この論文では、多くのバイアス緩和方法、サブグループの定義で共有される、見過ごされているが重要なステップである可能性があると仮定しています。
これを調査するために、複数のビジョンおよび言語分類タスクにわたる最先端のバイアス緩和方法の包括的な評価を実施します。これは、粗い、微調整された、交差、および騒々しいサブグループを含むサブグループ定義を体系的に変化させます。
私たちの結果は、サブグループの選択がパフォーマンスに大きな影響を与えることを明らかにしており、特定のグループ化は逆説的に緩和がまったくないよりも悪い結果につながります。
私たちの調査結果は、一連のサブグループ間で格差を観察することが、これらのサブグループを緩和に使用する十分な理由ではないことを示唆しています。
理論分析を通じて、これらの現象を説明し、特定の一連のサブグループに関する公平性を改善することは、緩和のために異なるサブグループを使用することで最もよく達成されるという直感に反する洞察を明らかにします。
私たちの研究は、バイアス緩和における慎重なサブグループ定義の重要性を強調し、機械学習モデルの堅牢性と公平性を改善するための代替レバーとしてそれを示唆しています。

要約(オリジナル)

Despite the constant development of new bias mitigation methods for machine learning, no method consistently succeeds, and a fundamental question remains unanswered: when and why do bias mitigation techniques fail? In this paper, we hypothesise that a key factor may be the often-overlooked but crucial step shared by many bias mitigation methods: the definition of subgroups. To investigate this, we conduct a comprehensive evaluation of state-of-the-art bias mitigation methods across multiple vision and language classification tasks, systematically varying subgroup definitions, including coarse, fine-grained, intersectional, and noisy subgroups. Our results reveal that subgroup choice significantly impacts performance, with certain groupings paradoxically leading to worse outcomes than no mitigation at all. Our findings suggest that observing a disparity between a set of subgroups is not a sufficient reason to use those subgroups for mitigation. Through theoretical analysis, we explain these phenomena and uncover a counter-intuitive insight that, in some cases, improving fairness with respect to a particular set of subgroups is best achieved by using a different set of subgroups for mitigation. Our work highlights the importance of careful subgroup definition in bias mitigation and suggest it as a alternative lever for improving the robustness and fairness of machine learning models.

arxiv情報

著者	Anissa Alloula,Charles Jones,Ben Glocker,Bartłomiej W. Papież
発行日	2025-05-27 15:52:58+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.LG | コメントを受け付けていません

Towards Interpretability Without Sacrifice: Faithful Dense Layer Decomposition with Mixture of Decoders

投稿日: 2025年5月28日作成者: jarxiv

要約

多層パーセプロン（MLP）は大規模な言語モデルの不可欠な部分ですが、それらの密な表現により、理解し、編集、操縦するのが難しくなります。
最近の方法は、ニューロンレベルのスパースを介して解釈可能な近似を学びますが、元のマッピングを忠実に再構築することはできません。
この論文では、まばらな層近似での精度のトレードオフを克服するために、層レベルのスパース性に移行することを提唱しています。
このパラダイムの下で、デコーダー（MXD）の混合物を導入します。
MXDSはMLPとゲートの線形ユニットを一般化し、事前に訓練された密な層を数万の特殊なサブレーヤーに拡張します。
柔軟な形式のテンソル因数分解を通して、それぞれがまばらに活性化するMXDサブレイヤーは、フルランクの重みで線形変換を実装します。
実験的に、MXDは、最大3Bパラメーターを持つ言語モデルのSparsity-Accuracy Frontierの最先端の方法（たとえば、トランスコダー）を大幅に上回ることを示します。
スパースプロービングと機能ステアリングに関するさらなる評価は、MXDが自然言語の同様に専門的な機能を学習することを示しています。
私たちのコードは、https：//github.com/james-oldfield/mxd/に含まれています。

要約(オリジナル)

Multilayer perceptrons (MLPs) are an integral part of large language models, yet their dense representations render them difficult to understand, edit, and steer. Recent methods learn interpretable approximations via neuron-level sparsity, yet fail to faithfully reconstruct the original mapping–significantly increasing model’s next-token cross-entropy loss. In this paper, we advocate for moving to layer-level sparsity to overcome the accuracy trade-off in sparse layer approximation. Under this paradigm, we introduce Mixture of Decoders (MxDs). MxDs generalize MLPs and Gated Linear Units, expanding pre-trained dense layers into tens of thousands of specialized sublayers. Through a flexible form of tensor factorization, each sparsely activating MxD sublayer implements a linear transformation with full-rank weights–preserving the original decoders’ expressive capacity even under heavy sparsity. Experimentally, we show that MxDs significantly outperform state-of-the-art methods (e.g., Transcoders) on the sparsity-accuracy frontier in language models with up to 3B parameters. Further evaluations on sparse probing and feature steering demonstrate that MxDs learn similarly specialized features of natural language–opening up a promising new avenue for designing interpretable yet faithful decompositions. Our code is included at: https://github.com/james-oldfield/MxD/.

arxiv情報

著者	James Oldfield,Shawn Im,Yixuan Li,Mihalis A. Nicolaou,Ioannis Patras,Grigorios G Chrysos
発行日	2025-05-27 15:55:55+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.LG | コメントを受け付けていません

Improving LLM-based Global Optimization with Search Space Partitioning

投稿日: 2025年5月28日作成者: jarxiv

要約

大規模な言語モデル（LLM）は最近、高価なブラックボックス関数のグローバル最適化フレームワーク内で効果的な代理モデルおよび候補ジェネレーターとして浮上しました。
有望な結果にもかかわらず、LLMベースの方法はしばしば高次元の検索スペースで苦労しているか、ドメイン固有の事前に不足している場合、まばらまたは情報のない提案につながります。
これらの制限を克服するために、検索スペースを有望なサブリージョンに分割することによりLLM駆動型サンプリングを強化する新しいグローバルな最適化アルゴリズムであるHollmを提案します。
各サブリージョンは、探査と搾取のバランスをとる盗賊に触発されたスコアリングメカニズムを介して選択された「メタアーム」として機能します。
選択した各サブリージョン内で、LLMは明示的なドメイン知識なしに、高品質の候補ポイントを提案します。
標準的な最適化ベンチマークに関する経験的評価は、Hollがベイジアンの主要な最適化と信頼地域の主要なメソッドと一貫して一致または上回っている一方で、グローバルLLMベースのサンプリング戦略を大幅に上回っていることを示しています。

要約(オリジナル)

Large Language Models (LLMs) have recently emerged as effective surrogate models and candidate generators within global optimization frameworks for expensive blackbox functions. Despite promising results, LLM-based methods often struggle in high-dimensional search spaces or when lacking domain-specific priors, leading to sparse or uninformative suggestions. To overcome these limitations, we propose HOLLM, a novel global optimization algorithm that enhances LLM-driven sampling by partitioning the search space into promising subregions. Each subregion acts as a “meta-arm” selected via a bandit-inspired scoring mechanism that effectively balances exploration and exploitation. Within each selected subregion, an LLM then proposes high-quality candidate points, without any explicit domain knowledge. Empirical evaluation on standard optimization benchmarks shows that HOLLM consistently matches or surpasses leading Bayesian optimization and trust-region methods, while substantially outperforming global LLM-based sampling strategies.

arxiv情報

著者	Andrej Schwanke,Lyubomir Ivanov,David Salinas,Fabio Ferreira,Aaron Klein,Frank Hutter,Arber Zela
発行日	2025-05-27 16:01:49+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.LG | コメントを受け付けていません

Path Pooling: Training-Free Structure Enhancement for Efficient Knowledge Graph Retrieval-Augmented Generation

投稿日: 2025年5月28日作成者: jarxiv

要約

大規模な言語モデルは多くのタスクで大成功を収めていますが、実際のアプリケーションの幻覚や知識の欠陥に依然として悩まされています。
多くのナレッジグラフベースの検索された生成（KG-RAG）メソッドは、KGSの構造とセマンティック情報を外部知識ベースとして活用することにより、LLMの品質と信頼性を高めます。
ただし、これらの方法は、高い計算コストが発生するか、利用可能な知識を十分に活用していない、構造情報を効果的に組み込むのに苦労しています。
グラフ表現学習のスムージング操作に触発され、新しいパス中心のプーリング操作を通じて構造情報を導入するシンプルでトレーニングフリーの戦略であるパスプーリングを提案します。
プラグアンドプレイで既存のKG-RAGメソッドにシームレスに統合され、より豊富な構造情報の利用が可能になります。
広範な実験は、最先端のKGラグ法にプールするパスを組み込むことで、さまざまな設定でパフォーマンスを一貫して改善しながら、無視できる追加コストを導入することを示しています。

要約(オリジナル)

Although Large Language Models achieve strong success in many tasks, they still suffer from hallucinations and knowledge deficiencies in real-world applications. Many knowledge graph-based retrieval-augmented generation (KG-RAG) methods enhance the quality and credibility of LLMs by leveraging structure and semantic information in KGs as external knowledge bases. However, these methods struggle to effectively incorporate structure information, either incurring high computational costs or underutilizing available knowledge. Inspired by smoothing operations in graph representation learning, we propose path pooling, a simple, training-free strategy that introduces structure information through a novel path-centric pooling operation. It seamlessly integrates into existing KG-RAG methods in a plug-and-play manner, enabling richer structure information utilization. Extensive experiments demonstrate that incorporating the path pooling into the state-of-the-art KG-RAG method consistently improves performance across various settings while introducing negligible additional cost.

arxiv情報

著者	Hairu Wang,Yuan Feng,Xike Xie,S Kevin Zhou
発行日	2025-05-27 16:06:13+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL | コメントを受け付けていません

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント