Efficient Online Learning with Memory via Frank-Wolfe Optimization: Algorithms with Bounded Dynamic Regret and Applications to Control

要約

タイトル: Frank-Wolfe 最適化によるメモリを持つ効率的なオンライン学習：動的リグレットが制限されたアルゴリズムとコントロールへの応用

要約:

– オンライン学習における投影操作は、典型的な計算ボトルネックである。
– 本論文では、オンライン凸最適化フレームワーク内で投影フリーのオンライン学習を可能にする。
– オンライン凸最適化にメモリを組み合わせたオンライン凸最適化（OCO-M）により、過去の決定が現在の結果にどのように影響するかをキャプチャできる。
– 特に、メタベース学習アルゴリズムを導入し、動的リグレットを最小化する。すなわち、時間変化する決定の任意のシーケンスに対する最適性を最小限に抑えることができる。
– 可変環境に自律エージェントがリアルタイムに適応する必要がある人工知能アプリケーションに着目している。そのような応用例は、動的システムのオンライン制御、統計的アービトラージ、時系列予測などである。
– アルゴリズムは、オンライン Frank-Wolfe（OFW）およびヘッジアルゴリズムに基づいて構築される。
– アルゴリズムを、予測不能なプロセスノイズが存在する線形時間変化システムのオンライン制御に適用する方法を示す。
– このために、メモリおよび最適な時間変化する線形フィードバック制御ポリシーに対する動的リグレットが制限されたコントローラーを開発する。
– アルゴリズムを、線形時間不変システムのオンライン制御のシミュレーションシナリオで検証する。

要約(オリジナル)

Projection operations are a typical computation bottleneck in online learning. In this paper, we enable projection-free online learning within the framework of Online Convex Optimization with Memory (OCO-M) — OCO-M captures how the history of decisions affects the current outcome by allowing the online learning loss functions to depend on both current and past decisions. Particularly, we introduce the first projection-free meta-base learning algorithm with memory that minimizes dynamic regret, i.e., that minimizes the suboptimality against any sequence of time-varying decisions. We are motivated by artificial intelligence applications where autonomous agents need to adapt to time-varying environments in real-time, accounting for how past decisions affect the present. Examples of such applications are: online control of dynamical systems; statistical arbitrage; and time series prediction. The algorithm builds on the Online Frank-Wolfe (OFW) and Hedge algorithms. We demonstrate how our algorithm can be applied to the online control of linear time-varying systems in the presence of unpredictable process noise. To this end, we develop a controller with memory and bounded dynamic regret against any optimal time-varying linear feedback control policy. We validate our algorithm in simulated scenarios of online control of linear time-invariant systems.

arxiv情報

著者	Hongyu Zhou,Zirui Xu,Vasileios Tzoumas
発行日	2023-03-31 16:29:36+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

Efficient Online Learning with Memory via Frank-Wolfe Optimization: Algorithms with Bounded Dynamic Regret and Applications to Control

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー