【苦しみながら理解する強化学習】チュートリアル その4-5

概要

it is just as important to understand how, and even more critically, why that agent is behaving in a certain way.

We can think of this visualization as providing a portal into the “thought process” of our agent. Does it know that it is in a good position when it is in a good position? Does it know that going down was a good thing to do when it went down? This can give us the insights needed to understand why our agent might not be performing ideally as we train it under different circumstances in different environments.

Not only can we use the interface to explore how the agent does during training, we can also use it for testing and debugging our fully trained agents.

デバッグ方法まだわかってないんですが、デバッグに使えたらいいですね！

トレーニングした後に、テストしてみたら緑ばっかり出現した時はグリーンに近づくactionの価値が上がり、

そして、緑も赤もとぱらった時は、ランダムに移動した。

なんとなく、テスト方法がわかりましたねw

While we may explicitly only think of the green as being rewarding and the red as being punishing, we are subconsciously constraining our actions by a desire to finish quickly.

Using the Control Center

The agent’s performance you will see was pretrained on the gridworld task for 40,000 episodes.

The Control Center is a piece of software I plan to continue to develop as I work more with various Reinforcement Learning algorithms.