An End-to-End Reinforcement Learning Approach for Job-Shop Scheduling Problems Based on Constraint Programming

要約

制約プログラミング (CP) は、ジョブショップスケジューリング問題 (JSSP) などの組み合わせ最適化問題のモデル化と解決を可能にする宣言型プログラミングパラダイムです。
CP ソルバーは、小規模なインスタンスに対しては最適な解、または最適に近い解を見つけることができますが、大規模なインスタンスにはうまく拡張できません。つまり、長い計算時間が必要になったり、低品質の解が生成されたりします。
したがって、実際のスケジューリングアプリケーションでは、多くの場合、高速で手作りの優先順位ベースのディスパッチングヒューリスティックを利用して、適切な初期ソリューションを見つけ、最適化手法を使用してそれを改良します。
この論文では、CP と強化学習 (RL) を使用してスケジューリング問題を解決するための新しいエンドツーエンドのアプローチを提案します。
手続き型シミュレーションアルゴリズム、複雑な特徴エンジニアリング、または手作りの報酬関数を含めることによって特定の問題に合わせて調整された以前の RL 手法とは対照的に、私たちのニューラルネットワークアーキテクチャとトレーニングアルゴリズムは、一連のスケジューリング問題の汎用 CP エンコーディングを必要とするだけです。
小さな例。
私たちのアプローチでは、既存の CP ソルバーを利用して、別のデータセットからでも大規模なインスタンスに適切に一般化する優先ディスパッチングルール (PDR) を学習するエージェントをトレーニングします。
文献からの 7 つの JSSP データセットでこの手法を評価し、同じ制限時間内で静的 PDR や CP ソルバーによって得られるよりも高品質な解を非常に大規模なインスタンスに対して見つける能力を示しています。

要約(オリジナル)

Constraint Programming (CP) is a declarative programming paradigm that allows for modeling and solving combinatorial optimization problems, such as the Job-Shop Scheduling Problem (JSSP). While CP solvers manage to find optimal or near-optimal solutions for small instances, they do not scale well to large ones, i.e., they require long computation times or yield low-quality solutions. Therefore, real-world scheduling applications often resort to fast, handcrafted, priority-based dispatching heuristics to find a good initial solution and then refine it using optimization methods. This paper proposes a novel end-to-end approach to solving scheduling problems by means of CP and Reinforcement Learning (RL). In contrast to previous RL methods, tailored for a given problem by including procedural simulation algorithms, complex feature engineering, or handcrafted reward functions, our neural-network architecture and training algorithm merely require a generic CP encoding of some scheduling problem along with a set of small instances. Our approach leverages existing CP solvers to train an agent learning a Priority Dispatching Rule (PDR) that generalizes well to large instances, even from separate datasets. We evaluate our method on seven JSSP datasets from the literature, showing its ability to find higher-quality solutions for very large instances than obtained by static PDRs and by a CP solver within the same time limit.

arxiv情報

著者	Pierre Tassel,Martin Gebser,Konstantin Schekotihin
発行日	2023-06-09 08:24:56+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

An End-to-End Reinforcement Learning Approach for Job-Shop Scheduling Problems Based on Constraint Programming

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー