Muti-Agent Proximal Policy Optimization For Data Freshness in UAV-assisted Networks

要約

無人航空機 (UAV) は、ワイヤレス通信ネットワークで幅広いタスクを実行する有望なテクノロジと見なされています。
この作業では、IoT デバイスによって生成されたデータを収集するための UAV のグループの展開について検討します。
具体的には、収集されたデータが時間に敏感であり、適時性を維持することが重要である場合に焦点を当てています。
私たちの目的は、UAV の軌道を最適に設計することであり、グローバルな Age-of-Updates (AoU) などの訪問した IoT デバイスのサブセットを最小限に抑えることです。
この目的のために、時間とサービスの品質の制約の下で、調査対象の問題を混合整数非線形計画法 (MINLP) として定式化します。
結果として生じる最適化問題を効率的に解決するために、協調型マルチエージェント強化学習 (MARL) フレームワークを調査し、一般的なポリシー上の強化学習 (RL) アルゴリズムであるポリシー近接最適化 (PPO) に基づく RL アプローチを提案します。
私たちのアプローチは、UAV が集中型の価値関数をトレーニングしながら最適なポリシーを学習する、集中型トレーニングの分散型実行 (CTDE) フレームワークを活用します。
シミュレーション結果は、提案された MAPPO アプローチが、従来のポリシー外強化学習アプローチと比較して、グローバル AoU を少なくとも 1/2 に削減することを示しています。

要約(オリジナル)

Unmanned aerial vehicles (UAVs) are seen as a promising technology to perform a wide range of tasks in wireless communication networks. In this work, we consider the deployment of a group of UAVs to collect the data generated by IoT devices. Specifically, we focus on the case where the collected data is time-sensitive, and it is critical to maintain its timeliness. Our objective is to optimally design the UAVs’ trajectories and the subsets of visited IoT devices such as the global Age-of-Updates (AoU) is minimized. To this end, we formulate the studied problem as a mixed-integer nonlinear programming (MINLP) under time and quality of service constraints. To efficiently solve the resulting optimization problem, we investigate the cooperative Multi-Agent Reinforcement Learning (MARL) framework and propose an RL approach based on the popular on-policy Reinforcement Learning (RL) algorithm: Policy Proximal Optimization (PPO). Our approach leverages the centralized training decentralized execution (CTDE) framework where the UAVs learn their optimal policies while training a centralized value function. Our simulation results show that the proposed MAPPO approach reduces the global AoU by at least a factor of 1/2 compared to conventional off-policy reinforcement learning approaches.

arxiv情報

著者	Mouhamed Naby Ndiaye,El Houcine Bergou,Hajar El Hammouti
発行日	2023-03-15 15:03:09+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Muti-Agent Proximal Policy Optimization For Data Freshness in UAV-assisted Networks

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー