The Indoor-Training Effect: unexpected gains from distribution shifts in the transition function

要約

テニスのトレーニングは、きれいな屋内環境で行うのと、騒がしい屋外環境で行うのではどちらが良いでしょうか?
この問題をモデル化するために、ここでは、強化学習問題におけるトレーニング環境とテスト環境の間の遷移確率の変化が、特定の条件下でパフォーマンスの向上につながるかどうかを調査します。
定量化可能なパラメトリックノイズを遷移関数に追加することにより、特定の MDP から開始して新しいマルコフ決定プロセス (MDP) を生成します。
このプロセスをノイズ挿入と呼び、その結果得られる環境を {\delta} 環境と呼びます。
このプロセスにより、環境間の距離の指標として機能するノイズを定量的に制御しながら、同じ環境のバリエーションを作成することができます。
従来の通念では、同じ MDP でトレーニングとテストを行うと最良の結果が得られるはずだと考えられています。
まったく対照的に、同じ {\delta} 環境でトレーニングとテストを行うよりも、ノイズのない環境でトレーニングし、ノイズの多い {\delta} 環境でテストすると、エージェントのパフォーマンスが向上することが観察されています。
私たちは、この発見がノイズの変動を超えて広がることを確認しました。パックマンのゴースト動作の変化やポンのパドル動作など、ATARI ゲームのバリエーションでも同じ現象を示すことが可能です。
この興味深い動作を、パックマン、ポン、ブレイクアウトなどの 60 種類の ATARI ゲームで実証します。
私たちはこの現象をインドアトレーニング効果と呼んでいます。
実験を再現し、ノイズ挿入を実装するコードは https://bit.ly/3X6CTYk にあります。

要約(オリジナル)

Is it better to perform tennis training in a pristine indoor environment or a noisy outdoor one? To model this problem, here we investigate whether shifts in the transition probabilities between the training and testing environments in reinforcement learning problems can lead to better performance under certain conditions. We generate new Markov Decision Processes (MDPs) starting from a given MDP, by adding quantifiable, parametric noise into the transition function. We refer to this process as Noise Injection and the resulting environments as {\delta}-environments. This process allows us to create variations of the same environment with quantitative control over noise serving as a metric of distance between environments. Conventional wisdom suggests that training and testing on the same MDP should yield the best results. In stark contrast, we observe that agents can perform better when trained on the noise-free environment and tested on the noisy {\delta}-environments, compared to training and testing on the same {\delta}-environments. We confirm that this finding extends beyond noise variations: it is possible to showcase the same phenomenon in ATARI game variations including varying Ghost behaviour in PacMan, and Paddle behaviour in Pong. We demonstrate this intriguing behaviour across 60 different variations of ATARI games, including PacMan, Pong, and Breakout. We refer to this phenomenon as the Indoor-Training Effect. Code to reproduce our experiments and to implement Noise Injection can be found at https://bit.ly/3X6CTYk.

arxiv情報

著者	Serena Bono,Spandan Madan,Ishaan Grover,Mao Yasueda,Cynthia Breazeal,Hanspeter Pfister,Gabriel Kreiman
発行日	2025-01-08 16:31:06+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

The Indoor-Training Effect: unexpected gains from distribution shifts in the transition function

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー