Recommending the optimal policy by learning to act from temporal data

要約

規範的プロセス監視は、プロセスマイニングにおける重要な問題であり、関心のある目標測定値または重要業績評価指標 (KPI) を最適化することを目的として、推奨される一連のアクションを特定することで構成されます。
この問題を困難にしている課題の 1 つは、適切に作成され、人間が検証した明示的なモデルが不足しているため、いわゆる実行ログに保存された一時的に注釈が付けられた (プロセス) 実行データのみに基づいて、規範的なプロセス監視手法を提供する必要があることです。
この論文では、強化学習 (RL) を使用して、過去の実行の観察のみから (ほぼ) 最適なポリシーを学習し、関心のある KPI を最適化するために実行する最適なアクティビティを推奨する AI ベースのアプローチを提案することを目指しています。
.
これは、最初にデータから特定の KPI のマルコフ決定プロセスを学習し、次に RL トレーニングを使用して最適なポリシーを学習することによって達成されます。
このアプローチは、実際のデータセットと合成データセットで検証され、ポリシー外の Deep RL アプローチと比較されます。
Deep RL アプローチと比較し、しばしば克服するアプローチの能力は、一時的な実行データのみが利用可能なシナリオでのホワイトボックス RL 手法の活用に貢献します。

要約(オリジナル)

Prescriptive Process Monitoring is a prominent problem in Process Mining, which consists in identifying a set of actions to be recommended with the goal of optimising a target measure of interest or Key Performance Indicator (KPI). One challenge that makes this problem difficult is the need to provide Prescriptive Process Monitoring techniques only based on temporally annotated (process) execution data, stored in, so-called execution logs, due to the lack of well crafted and human validated explicit models. In this paper we aim at proposing an AI based approach that learns, by means of Reinforcement Learning (RL), an optimal policy (almost) only from the observation of past executions and recommends the best activities to carry on for optimizing a KPI of interest. This is achieved first by learning a Markov Decision Process for the specific KPIs from data, and then by using RL training to learn the optimal policy. The approach is validated on real and synthetic datasets and compared with off-policy Deep RL approaches. The ability of our approach to compare with, and often overcome, Deep RL approaches provides a contribution towards the exploitation of white box RL techniques in scenarios where only temporal execution data are available.

arxiv情報

著者	Stefano Branchi,Andrei Buliga,Chiara Di Francescomarino,Chiara Ghidini,Francesca Meneghello,Massimiliano Ronzani
発行日	2023-03-16 10:30:36+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Recommending the optimal policy by learning to act from temporal data

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー