jarxiv | Japanese arxiv | ページ 797

SNN-Based Online Learning of Concepts and Action Laws in an Open World

投稿日: 2025年4月24日作成者: jarxiv

要約

エージェントのセマンティックメモリを実装するスパイクニューラルネットワーク（SNN）を中心に構築された、完全に自律的でバイオ風の認知エージェントのアーキテクチャを提示します。
このエージェントは、その宇宙を探求し、オブジェクト/状況と独自の行動の概念をワンショットの方法で学びます。
オブジェクト/状況の概念は統一されていますが、アクションの概念は、初期の状況、運動活動、結果で構成されるトリプルです。
彼らは、その宇宙の行動法に関するエージェントの知識を具体化しています。
どちらの種類の概念にも、一般性の程度が異なります。
決定を下すために、エージェントは、想定されたアクションの予想される結果のセマンティックメモリを照会し、これらの予測に基づいて行うアクションを選択します。
私たちの実験は、エージェントが以前に学んだ一般的な概念に訴えて新しい状況を処理し、環境の変化に適応するためにその概念を迅速に修正することを示しています。

要約(オリジナル)

We present the architecture of a fully autonomous, bio-inspired cognitive agent built around a spiking neural network (SNN) implementing the agent’s semantic memory. This agent explores its universe and learns concepts of objects/situations and of its own actions in a one-shot manner. While object/situation concepts are unary, action concepts are triples made up of an initial situation, a motor activity, and an outcome. They embody the agent’s knowledge of its universe’s action laws. Both kinds of concepts have different degrees of generality. To make decisions the agent queries its semantic memory for the expected outcomes of envisaged actions and chooses the action to take on the basis of these predictions. Our experiments show that the agent handles new situations by appealing to previously learned general concepts and rapidly modifies its concepts to adapt to environment changes.

arxiv情報

著者	Christel Grimaud,Dominique Longin,Andreas Herzig
発行日	2025-04-23 13:28:28+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.LG, cs.NE, cs.RO | コメントを受け付けていません

Exploring the Role of Knowledge Graph-Based RAG in Japanese Medical Question Answering with Small-Scale LLMs

投稿日: 2025年4月24日作成者: jarxiv

要約

大規模な言語モデル（LLM）は医療QAでうまく機能しますが、日本の文脈での有効性は、臨床設定でGPT-4のような商用モデルの使用を妨げるプライバシーの制約のために制限されています。
その結果、最近の取り組みは、命令調整のオープンソースLLMSに焦点を当てていますが、それらを検索された生成（RAG）と組み合わせる可能性は未定です。
このギャップを埋めるために、私たちは日本の医療QA小規模オープンソースLLMの知識グラフベース（KG）RAGフレームワークを最初に探索しました。
実験結果は、KGベースのRAGが小規模のオープンソースLLMを使用した日本の医療QAに限られた影響しかないことを示しています。
さらなるケーススタディにより、RAGの有効性は、外部取得コンテンツの品質と関連性に敏感であることが明らかになりました。
これらの調査結果は、日本の医療QAにRAGを適用するという課題と可能性に対する貴重な洞察を提供すると同時に、他の低資源言語の参照としても機能します。

要約(オリジナル)

Large language models (LLMs) perform well in medical QA, but their effectiveness in Japanese contexts is limited due to privacy constraints that prevent the use of commercial models like GPT-4 in clinical settings. As a result, recent efforts focus on instruction-tuning open-source LLMs, though the potential of combining them with retrieval-augmented generation (RAG) remains underexplored. To bridge this gap, we are the first to explore a knowledge graph-based (KG) RAG framework for Japanese medical QA small-scale open-source LLMs. Experimental results show that KG-based RAG has only a limited impact on Japanese medical QA using small-scale open-source LLMs. Further case studies reveal that the effectiveness of the RAG is sensitive to the quality and relevance of the external retrieved content. These findings offer valuable insights into the challenges and potential of applying RAG in Japanese medical QA, while also serving as a reference for other low-resource languages.

arxiv情報

著者	Yingjian Chen,Feiyang Li,Xingyu Song,Tianxiao Li,Zixin Xu,Xiujie Chen,Issey Sukeda,Irene Li
発行日	2025-04-23 13:54:01+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL | コメントを受け付けていません

Pushing the Frontier on Approximate EFX Allocations

投稿日: 2025年4月24日作成者: jarxiv

要約

私たちは、一連の不可分な商品を加法評価関数を備えた一連のエージェントに割り当てる問題を研究し、あらゆる良好（$ \ alpha $ -EFX）に近似en望的な影響を達成することを目指しています。
問題に関する最先端の結果には、（正確な）EFX割り当てが（a）最大3つのエージェントが存在する場合、または（b）エージェントの評価関数が最大2つの値をとることができる場合、または（c）エージェントの評価関数をグラフで表すことができます。
$ \ alpha $ -EFXの場合、補数評価機能を備えた任意の数のエージェントに対して$ 0.618 $ -EFX割り当てが存在することが知られています。
この論文では、（a）せいぜい\ end {7エージェント}、（b）エージェントの評価関数がせいぜい\ emph {3つの値}、または（c）エージェントの評価機能をA \ empiraph}を介して表現できる場合、$ 2/3 $ -EFXの割り当てが存在することを示します。
私たちの結果は2つの方法で解釈できます。
まず、EFXの概念を$ 2/3 $ -EFXに緩和することにより、正確なEFX割り当てが存在することが知られている設定の厳密な一般化の存在結果を得ます。
第二に、設定に制限を課すことにより、$ 0.618 $の障壁を打ち負かし、$ 2/3 $の近似保証を達成することができます。
したがって、我々の結果は、近似EFX割り当ての存在と計算の\ emph {frontier}を推進し、正確なEFX割り当ての存在を解決するという課題に関する洞察を提供します。

要約(オリジナル)

We study the problem of allocating a set of indivisible goods to a set of agents with additive valuation functions, aiming to achieve approximate envy-freeness up to any good ($\alpha$-EFX). The state-of-the-art results on the problem include that (exact) EFX allocations exist when (a) there are at most three agents, or (b) the agents’ valuation functions can take at most two values, or (c) the agents’ valuation functions can be represented via a graph. For $\alpha$-EFX, it is known that a $0.618$-EFX allocation exists for any number of agents with additive valuation functions. In this paper, we show that $2/3$-EFX allocations exist when (a) there are at most \emph{seven agents}, (b) the agents’ valuation functions can take at most \emph{three values}, or (c) the agents’ valuation functions can be represented via a \emph{multigraph}. Our results can be interpreted in two ways. First, by relaxing the notion of EFX to $2/3$-EFX, we obtain existence results for strict generalizations of the settings for which exact EFX allocations are known to exist. Secondly, by imposing restrictions on the setting, we manage to beat the barrier of $0.618$ and achieve an approximation guarantee of $2/3$. Therefore, our results push the \emph{frontier} of existence and computation of approximate EFX allocations, and provide insights into the challenges of settling the existence of exact EFX allocations.

arxiv情報

著者	Georgios Amanatidis,Aris Filos-Ratsikas,Alkmini Sgouritsa
発行日	2025-04-23 13:55:32+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.DM, cs.GT | コメントを受け付けていません

IRIS: Interactive Research Ideation System for Accelerating Scientific Discovery

投稿日: 2025年4月24日作成者: jarxiv

要約

大規模な言語モデル（LLMS）の能力の急速な進歩は、極めて重要な疑問を提起します。LLMSはどのようにして科学的発見を加速できますか？
この研究は、研究の重要な第一段階に取り組み、新しい仮説を生み出します。
自動仮説の生成に関する最近の研究は、マルチエージェントフレームワークとテスト時間計算の拡張に焦点を当てていますが、相乗的な人間のループ（HITL）アプローチを通じて、透明性と操縦性を効果的に組み込むアプローチはありません。
このギャップに対処するために、IRIS：Interactive Research Ideation Systemを紹介します。これは、研究者がLLM支援の科学的アイデアを活用するために設計されたオープンソースプラットフォームです。
IRISには、モンテカルロツリー検索（MCTS）を介した適応テスト時間計算拡張、微細粒度フィードバックメカニズム、クエリベースの文献合成など、アイデアを強化するための革新的な機能が組み込まれています。
アイデア化プロセスを通じて、より大きな制御と洞察を持つ研究者に力を与えるように設計されています。
さらに、さまざまな分野の研究者とのユーザー調査を実施し、アイデアを強化する際のシステムの有効性を検証します。
https://github.com/anikethh/iris-interactive-research-ideation-systemでコードをオープンします

要約(オリジナル)

The rapid advancement in capabilities of large language models (LLMs) raises a pivotal question: How can LLMs accelerate scientific discovery? This work tackles the crucial first stage of research, generating novel hypotheses. While recent work on automated hypothesis generation focuses on multi-agent frameworks and extending test-time compute, none of the approaches effectively incorporate transparency and steerability through a synergistic Human-in-the-loop (HITL) approach. To address this gap, we introduce IRIS: Interactive Research Ideation System, an open-source platform designed for researchers to leverage LLM-assisted scientific ideation. IRIS incorporates innovative features to enhance ideation, including adaptive test-time compute expansion via Monte Carlo Tree Search (MCTS), fine-grained feedback mechanism, and query-based literature synthesis. Designed to empower researchers with greater control and insight throughout the ideation process. We additionally conduct a user study with researchers across diverse disciplines, validating the effectiveness of our system in enhancing ideation. We open-source our code at https://github.com/Anikethh/IRIS-Interactive-Research-Ideation-System

arxiv情報

著者	Aniketh Garikaparthi,Manasi Patwardhan,Lovekesh Vig,Arman Cohan
発行日	2025-04-23 14:01:36+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL | コメントを受け付けていません

Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics

投稿日: 2025年4月24日作成者: jarxiv

要約

堅牢で一般化可能な世界モデルの学習は、実際の環境で効率的でスケーラブルなロボット制御を可能にするために重要です。
この作業では、複雑で部分的に観察可能な、確率的ダイナミクスを正確にキャプチャする世界モデルを学習するための新しいフレームワークを紹介します。
提案された方法は、ドメイン固有の帰納的バイアスに依存することなく、信頼できる長老の予測を実現するために、二重自動性メカニズムと自己監視トレーニングを採用し、多様なロボットタスク全体の適応性を確保します。
さらに、想像上の環境での効率的なトレーニングと現実世界のシステムでのシームレスな展開のために世界モデルを活用するポリシー最適化フレームワークを提案します。
この作業は、長老の予測、エラーの蓄積、およびSIMからリアルへの転送の課題に対処することにより、モデルベースの強化学習を進めます。
スケーラブルで堅牢なフレームワークを提供することにより、導入された方法は、実際のアプリケーションで適応的で効率的なロボットシステムへの道を開きます。

要約(オリジナル)

Learning robust and generalizable world models is crucial for enabling efficient and scalable robotic control in real-world environments. In this work, we introduce a novel framework for learning world models that accurately capture complex, partially observable, and stochastic dynamics. The proposed method employs a dual-autoregressive mechanism and self-supervised training to achieve reliable long-horizon predictions without relying on domain-specific inductive biases, ensuring adaptability across diverse robotic tasks. We further propose a policy optimization framework that leverages world models for efficient training in imagined environments and seamless deployment in real-world systems. This work advances model-based reinforcement learning by addressing the challenges of long-horizon prediction, error accumulation, and sim-to-real transfer. By providing a scalable and robust framework, the introduced methods pave the way for adaptive and efficient robotic systems in real-world applications.

arxiv情報

著者	Chenhao Li,Andreas Krause,Marco Hutter
発行日	2025-04-23 14:03:59+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.LG, cs.RO | コメントを受け付けていません

A Survey of AI Agent Protocols

投稿日: 2025年4月24日作成者: jarxiv

要約

大規模な言語モデル（LLMS）の急速な発展により、顧客サービス、コンテンツ生成、データ分析、さらには医療など、多様な業界全体のLLMエージェントが広く展開されました。
ただし、より多くのLLMエージェントが展開されるにつれて、主要な問題が明らかになりました。これらのエージェントが外部ツールまたはデータソースと通信する標準的な方法はありません。
この標準化されたプロトコルの欠如により、エージェントが協力したり効果的にスケーリングしたりすることが困難になり、複雑で実世界のタスクに取り組む能力が制限されます。
LLMエージェント向けの統一された通信プロトコルは、これを変更する可能性があります。
エージェントとツールがよりスムーズに相互作用し、コラボレーションを促進し、集合的な知性の形成をトリガーすることができます。
この論文では、LLMエージェント向けの既存の通信プロトコルの体系的な概要を説明します。
それらを4つの主要なカテゴリに分類し、ユーザーと開発者が特定のアプリケーションに最適なプロトコルを選択できるように分析を行います。
さらに、セキュリティ、スケーラビリティ、レイテンシなどの主要な次元にわたって、これらのプロトコルの比較パフォーマンス分析を実施します。
最後に、プロトコルがどのように進化する環境でどのように適応し、生き残ることができるか、将来のプロトコルが次世代のLLMエージェントエコシステムをサポートするために必要な品質など、将来の課題を探ります。
この作業は、インテリジェントエージェント向けの堅牢な通信インフラストラクチャの設計、評価、統合を目指している研究者とエンジニアの両方にとって実用的な参照として機能することを期待しています。

要約(オリジナル)

The rapid development of large language models (LLMs) has led to the widespread deployment of LLM agents across diverse industries, including customer service, content generation, data analysis, and even healthcare. However, as more LLM agents are deployed, a major issue has emerged: there is no standard way for these agents to communicate with external tools or data sources. This lack of standardized protocols makes it difficult for agents to work together or scale effectively, and it limits their ability to tackle complex, real-world tasks. A unified communication protocol for LLM agents could change this. It would allow agents and tools to interact more smoothly, encourage collaboration, and triggering the formation of collective intelligence. In this paper, we provide a systematic overview of existing communication protocols for LLM agents. We classify them into four main categories and make an analysis to help users and developers select the most suitable protocols for specific applications. Additionally, we conduct a comparative performance analysis of these protocols across key dimensions such as security, scalability, and latency. Finally, we explore future challenges, such as how protocols can adapt and survive in fast-evolving environments, and what qualities future protocols might need to support the next generation of LLM agent ecosystems. We expect this work to serve as a practical reference for both researchers and engineers seeking to design, evaluate, or integrate robust communication infrastructures for intelligent agents.

arxiv情報

著者	Yingxuan Yang,Huacan Chai,Yuanyi Song,Siyuan Qi,Muning Wen,Ning Li,Junwei Liao,Haoyi Hu,Jianghao Lin,Gaowei Chang,Weiwen Liu,Ying Wen,Yong Yu,Weinan Zhang
発行日	2025-04-23 14:07:26+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI | コメントを受け付けていません

MOSAIC: A Skill-Centric Algorithmic Framework for Long-Horizon Manipulation Planning

投稿日: 2025年4月24日作成者: jarxiv

要約

一連の事前定義されたスキルを使用した長老の動きの計画は、ロボット工学とAIの重要な課題です。
この課題に対処するには、スキルの組み合わせを体系的に調査して、タスク解決シーケンスを明らかにし、一般的で学習しやすいスキル（プッシュ、グラッピング）を活用して、目に見えないタスク全体に一般化し、広範なドメインとタスク固有の知識を必要とする象徴的な世界表現に依存することをバイパスする方法が必要です。
大きな進歩にもかかわらず、これらの要素は既存のアプローチに大部分がばらばらにとどまり、複雑で長老の問題に対して堅牢でスケーラブルなソリューションを達成することに重大なギャップを残しています。
この作業では、スキル自体を使用して計画プロセスを導くことにより、これらの要素を統一するスキル中心のフレームワークであるモザイクを提示します。
Mosaicは、発電機が実行可能軌道と世界構成を計算する2つのファミリを使用し、コネクタは境界値の問題を解決し、全体的なタスクの完了に向けて進捗を可能にすることにより、これらの独立して生成されたスキル軌跡をリンクします。
事前定義された開始状態または目標状態からスキルを徐々に発見するという従来のパラダイム（探査を大幅に制限する制限）から脱却することにより、Mosaicは、スキルが本質的に効果的である地域に計画努力を焦点を当てています。
シミュレートされたロボット操作タスクと現実世界のロボット操作タスクの両方におけるモザイクの有効性を実証し、生成的拡散モデル、モーション計画アルゴリズム、操作固有のモデルを組み込んだ多様なスキルセットを使用して、複雑な長期計画問題を解決する能力を示します。
デモと例については、https：//skill-mosaic.github.ioにアクセスしてください。

要約(オリジナル)

Planning long-horizon motions using a set of predefined skills is a key challenge in robotics and AI. Addressing this challenge requires methods that systematically explore skill combinations to uncover task-solving sequences, harness generic, easy-to-learn skills (e.g., pushing, grasping) to generalize across unseen tasks, and bypass reliance on symbolic world representations that demand extensive domain and task-specific knowledge. Despite significant progress, these elements remain largely disjoint in existing approaches, leaving a critical gap in achieving robust, scalable solutions for complex, long-horizon problems. In this work, we present MOSAIC, a skill-centric framework that unifies these elements by using the skills themselves to guide the planning process. MOSAIC uses two families of skills: Generators compute executable trajectories and world configurations, and Connectors link these independently generated skill trajectories by solving boundary value problems, enabling progress toward completing the overall task. By breaking away from the conventional paradigm of incrementally discovering skills from predefined start or goal states–a limitation that significantly restricts exploration–MOSAIC focuses planning efforts on regions where skills are inherently effective. We demonstrate the efficacy of MOSAIC in both simulated and real-world robotic manipulation tasks, showcasing its ability to solve complex long-horizon planning problems using a diverse set of skills incorporating generative diffusion models, motion planning algorithms, and manipulation-specific models. Visit https://skill-mosaic.github.io for demonstrations and examples.

arxiv情報

著者	Itamar Mishani,Yorai Shaoul,Maxim Likhachev
発行日	2025-04-23 14:09:42+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.RO | コメントを受け付けていません

SemioLLM: Evaluating Large Language Models for Diagnostic Reasoning from Unstructured Clinical Narratives in Epilepsy

投稿日: 2025年4月24日作成者: jarxiv

要約

大規模な言語モデル（LLM）は、臨床知識をエンコードすることが示されています。
しかし、多くの評価は、構造化された質問アンダーベンチマークに依存しており、実際の環境で構造化されていない臨床物語について解釈と推論の重要な課題を見落としています。
フリーテキストの臨床記述を使用して、6つの最先端モデル（GPT-3.5、GPT-4、MIXTRAL-8X7B、QWEN-72B、LLAMA2、LLAMA3）をベンチマークする評価フレームワークであるSemiollmを提示します。
1,269の発作記述のデータベースを活用して、ほとんどのLLMが脳内の発作開始ゾーンの確率的予測を正確かつ自信を持って生成できることを示しています。
ほとんどのモデルは、迅速なエンジニアリング後に臨床医レベルのパフォーマンスに近づき、最も一貫した改善につながる専門家が誘導したチェーンオブサブの推論があります。
パフォーマンスは、臨床内の概説、物語の長さ、言語のコンテキスト（それぞれ13.7％、32.7％および14.2％のパフォーマンスの変動）によってさらに強く変調されました。
しかし、推論出力の専門家分析により、正しい予測は幻覚の知識と不足したソース引用の精度に基づいていることが明らかになり、臨床使用におけるLLMの解釈可能性を改善する必要性を強調しました。
全体として、Semiollmは、構造化されていない言葉による説明が診断情報をエンコードする臨床分野でLLMを評価するためのスケーラブルなドメイン適応可能なフレームワークを提供します。
最先端のモデルの強みと制限の両方を特定することにより、当社の仕事は、臨床的に堅牢でグローバルに適用可能なヘルスケアのためのAIシステムの開発をサポートしています。

要約(オリジナル)

Large Language Models (LLMs) have been shown to encode clinical knowledge. Many evaluations, however, rely on structured question-answer benchmarks, overlooking critical challenges of interpreting and reasoning about unstructured clinical narratives in real-world settings. Using free-text clinical descriptions, we present SemioLLM, an evaluation framework that benchmarks 6 state-of-the-art models (GPT-3.5, GPT-4, Mixtral-8x7B, Qwen-72B, LlaMa2, LlaMa3) on a core diagnostic task in epilepsy. Leveraging a database of 1,269 seizure descriptions, we show that most LLMs are able to accurately and confidently generate probabilistic predictions of seizure onset zones in the brain. Most models approach clinician-level performance after prompt engineering, with expert-guided chain-of-thought reasoning leading to the most consistent improvements. Performance was further strongly modulated by clinical in-context impersonation, narrative length and language context (13.7%, 32.7% and 14.2% performance variation, respectively). However, expert analysis of reasoning outputs revealed that correct prediction can be based on hallucinated knowledge and deficient source citation accuracy, underscoring the need to improve interpretability of LLMs in clinical use. Overall, SemioLLM provides a scalable, domain-adaptable framework for evaluating LLMs in clinical disciplines where unstructured verbal descriptions encode diagnostic information. By identifying both the strengths and limitations of state-of-the-art models, our work supports the development of clinically robust and globally applicable AI systems for healthcare.

arxiv情報

著者	Meghal Dani,Muthu Jeyanthi Prakash,Zeynep Akata,Stefanie Liebe
発行日	2025-04-23 14:25:46+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL | コメントを受け付けていません

HEMA : A Hippocampus-Inspired Extended Memory Architecture for Long-Context AI Conversations

投稿日: 2025年4月24日作成者: jarxiv

要約

大規模な言語モデル（LLMS）は、コンテキストウィンドウ内でうまく機能しているにもかかわらず、数百ターンにまたがる拡張会話の一貫性を維持することに苦労しています。
このペーパーでは、人間の認知プロセスに触発されたデュアルメモリシステムであるHema（海馬にインスパイアされた拡張メモリアーキテクチャ）を紹介します。
HEMAは、コンパクトメモリを組み合わせます – グローバルな物語の一貫性を保存する連続的に更新された1文の要約とベクターメモリ – コサインの類似性を介してクエリが照会されたチャンク埋め込みのエピソードストア。
6Bパラメーター変圧器と統合されると、HEMAは300回転を超えて300回転を超えてコヒーレントな対話を維持し、3,500トークン未満のプロンプトの長さを維持します。
実験結果は大幅な改善を示します。事実上のリコールの精度は41％から87％に増加し、5ポイントスケールで人間の定格コヒーレンスは2.7から4.3に向上します。
10Kインデックス付きチャンクを使用すると、ベクトルメモリはp@5> = 0.80およびr@50> = 0.74を達成し、要約のみのアプローチと比較して、精密回復曲線の下で領域を2倍にします。
アブレーション研究により、2つの重要な洞察が明らかになりました。年齢加重剪定によるセマンティックの忘却は、回収損失を最小限に抑えて検索レイテンシを34％削減し、2レベルの要約階層では、超長い会話で1,000ターンを超えるカスケードエラーを防ぎます。
HEMAは、逐語的リコールとセマンティックの連続性を組み合わせることで、モデル再訓練なしで1か月間の対話が可能なプライバシーを認識する会話型AIの実用的なソリューションが提供されることを示しています。

要約(オリジナル)

Large language models (LLMs) struggle with maintaining coherence in extended conversations spanning hundreds of turns, despite performing well within their context windows. This paper introduces HEMA (Hippocampus-Inspired Extended Memory Architecture), a dual-memory system inspired by human cognitive processes. HEMA combines Compact Memory – a continuously updated one-sentence summary preserving global narrative coherence, and Vector Memory – an episodic store of chunk embeddings queried via cosine similarity. When integrated with a 6B-parameter transformer, HEMA maintains coherent dialogues beyond 300 turns while keeping prompt length under 3,500 tokens. Experimental results show substantial improvements: factual recall accuracy increases from 41% to 87%, and human-rated coherence improves from 2.7 to 4.3 on a 5-point scale. With 10K indexed chunks, Vector Memory achieves P@5 >= 0.80 and R@50 >= 0.74, doubling the area under the precision-recall curve compared to summarization-only approaches. Ablation studies reveal two key insights: semantic forgetting through age-weighted pruning reduces retrieval latency by 34% with minimal recall loss, and a two-level summary hierarchy prevents cascade errors in ultra-long conversations exceeding 1,000 turns. HEMA demonstrates that combining verbatim recall with semantic continuity provides a practical solution for privacy-aware conversational AI capable of month-long dialogues without model retraining.

arxiv情報

著者	Kwangseob Ahn
発行日	2025-04-23 14:27:12+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CL | コメントを受け付けていません

Lightweight Latent Verifiers for Efficient Meta-Generation Strategies

投稿日: 2025年4月24日作成者: jarxiv

要約

検証剤は、ベースの大手言語モデル（LLM）によって生成される出力の正確性を評価する補助モデルです。
彼らは、LLMSの推論集約的な問題を解決するための多くの戦略で重要な役割を果たしています。
通常、検証剤はLLM自体であり、サポートする基本モデルよりも大きい（または大きい）ことが多く、計算上高価になります。
この作業では、ベースLLMの隠された状態から正確性シグナルを確実に抽出する新しい軽量検証アプローチであるLilaveを紹介します。
Lilaveの重要な利点は、従来のLLMベースの検証者が必要とする計算予算のほんの一部しか動作しないことです。
その実用性を実証するために、私たちはリラブと、Best-of-nや自己整合などの一般的なメタジェネレーション戦略と結び付けます。
さらに、条件付きの自己修正や条件付き過半数の投票などの新しいリラーベースのアプローチを設計し、LLMが小さい生成タスクの精度と効率の両方を大幅に向上させます。
私たちの作品は、LLMSの隠された状態から潜在情報を抽出することの実りを示し、推論集約型のアプリケーションのためのスケーラブルでリソース効率の高いソリューションへの扉を開きます。

要約(オリジナル)

Verifiers are auxiliary models that assess the correctness of outputs generated by base large language models (LLMs). They play a crucial role in many strategies for solving reasoning-intensive problems with LLMs. Typically, verifiers are LLMs themselves, often as large (or larger) than the base model they support, making them computationally expensive. In this work, we introduce a novel lightweight verification approach, LiLaVe, which reliably extracts correctness signals from the hidden states of the base LLM. A key advantage of LiLaVe is its ability to operate with only a small fraction of the computational budget required by traditional LLM-based verifiers. To demonstrate its practicality, we couple LiLaVe with popular meta-generation strategies, like best-of-n or self-consistency. Moreover, we design novel LiLaVe-based approaches, like conditional self-correction or conditional majority voting, that significantly improve both accuracy and efficiency in generation tasks with smaller LLMs. Our work demonstrates the fruitfulness of extracting latent information from the hidden states of LLMs, and opens the door to scalable and resource-efficient solutions for reasoning-intensive applications.

arxiv情報

著者	Bartosz Piotrowski,Witold Drzewakowski,Konrad Staniszewski,Piotr Miłoś
発行日	2025-04-23 14:33:20+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI | コメントを受け付けていません

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント