Pheromone-based Learning of Optimal Reasoning Paths

要約

大規模な言語モデル（LLMS）は、考え方の促しを通じて顕著な推論能力を実証していますが、複雑な問題の効果的な推論方法を発見することは、可能性のある中間ステップの広大な空間のために困難なままです。
ACOとLLMSを組み合わせて複雑な問題の最適な推論パスを効率的に発見する新しいアルゴリズムであるAnt Colony Optimization Guided Tree of Thound（ACO-TOT）を紹介します。
神経系でのヘビアン学習からインスピレーションを得て、私たちの方法は、中央のフェロモントレイルと既存のフェロモントレイルの重みのある組み合わせによって支配されている各アリの動きが支配されている、中央の思考の木を横断してフェロモンのトレイルを横断して敷設するために、明確に微調整されたLLM「アリ」のコレクションを採用しています。
独自の専門的な専門知識。
このアルゴリズムは、エンパーの混合ベースのスコアリング関数を使用して完全な推論パスを評価し、フェロモンは反復全体の生産的な推論パスを強化します。
3つの挑戦的な推論タスク（GSM8K、アークチャレンジ、数学）の実験は、ACO-TOTが既存の考え方の最適化アプローチよりも大幅に優れていることを示しています。
。

要約(オリジナル)

Large Language Models (LLMs) have demonstrated remarkable reasoning capabilities through chain-of-thought prompting, yet discovering effective reasoning methods for complex problems remains challenging due to the vast space of possible intermediate steps. We introduce Ant Colony Optimization-guided Tree of Thought (ACO-ToT), a novel algorithm that combines ACO with LLMs to discover optimal reasoning paths for complex problems efficiently. Drawing inspiration from Hebbian learning in neurological systems, our method employs a collection of distinctly fine-tuned LLM ‘ants’ to traverse and lay pheromone trails through a centralized tree of thought, with each ant’s movement governed by a weighted combination of existing pheromone trails and its own specialized expertise. The algorithm evaluates complete reasoning paths using a mixture-of-experts-based scoring function, with pheromones reinforcing productive reasoning paths across iterations. Experiments on three challenging reasoning tasks (GSM8K, ARC-Challenge, and MATH) demonstrate that ACO-ToT performs significantly better than existing chain-of-thought optimization approaches, suggesting that incorporating biologically inspired collective search mechanisms into LLM inference can substantially enhance reasoning capabilities.

arxiv情報

著者	Anirudh Chari,Aditya Tiwari,Richard Lian,Suraj Reddy,Brian Zhou
発行日	2025-01-31 16:42:31+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Pheromone-based Learning of Optimal Reasoning Paths

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー