Take a Step and Reconsider: Sequence Decoding for Self-Improved Neural Combinatorial Optimization

要約

ニューラル組み合わせ最適化 (NCO) 内の建設的なアプローチは、組み合わせ最適化問題を有限のマルコフ決定プロセスとして扱い、ニューラルポリシーネットワークによって導かれる一連の決定を通じてソリューションが段階的に構築されます。
ポリシーを訓練するために、最近の研究は強化学習と教師ありアプローチの限界に対処する「自己改善型」学習方法論に移行しています。
ここで、ポリシーは、現在のポリシーから派生したソリューションを擬似ラベルとして使用して、教師ありの方法で反復的にトレーニングされます。
これらのソリューションをポリシーから取得する方法によって、疑似ラベルの品質が決まります。
この論文では、置換なしのサンプリングシーケンスに基づいて自己改善学習を行うための、シンプルで問題に依存しないシーケンスデコード方法を提案します。
見つかった最適なソリューションを段階的に追跡し、中間の部分ソリューションからサンプリングプロセスを繰り返します。
以前にサンプリングされたシーケンスを無視するようにポリシーを変更することで、目に見えない代替案のみを強制的に考慮するようになり、ソリューションの多様性が高まります。
巡回セールスマンおよびキャパシテッド車両経路指定問題の実験結果は、その強力なパフォーマンスを示しています。
さらに、私たちの方法は、ジョブショップのスケジューリング問題に関して以前の NCO アプローチよりも優れています。

要約(オリジナル)

The constructive approach within Neural Combinatorial Optimization (NCO) treats a combinatorial optimization problem as a finite Markov decision process, where solutions are built incrementally through a sequence of decisions guided by a neural policy network. To train the policy, recent research is shifting toward a ‘self-improved’ learning methodology that addresses the limitations of reinforcement learning and supervised approaches. Here, the policy is iteratively trained in a supervised manner, with solutions derived from the current policy serving as pseudo-labels. The way these solutions are obtained from the policy determines the quality of the pseudo-labels. In this paper, we present a simple and problem-independent sequence decoding method for self-improved learning based on sampling sequences without replacement. We incrementally follow the best solution found and repeat the sampling process from intermediate partial solutions. By modifying the policy to ignore previously sampled sequences, we force it to consider only unseen alternatives, thereby increasing solution diversity. Experimental results for the Traveling Salesman and Capacitated Vehicle Routing Problem demonstrate its strong performance. Furthermore, our method outperforms previous NCO approaches on the Job Shop Scheduling Problem.

arxiv情報

著者	Jonathan Pirnay,Dominik G. Grimm
発行日	2024-07-24 12:06:09+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Take a Step and Reconsider: Sequence Decoding for Self-Improved Neural Combinatorial Optimization

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー