Should the exploration rate be reset after each trial in Q-learning?


As the title says, should I reset the exploration rate between trials?

I am currently doing the Open AI pendulum task and after a number of trials my model started playing but did not take any actions (i.e. didn't perform any significant swing). The Actor-Critic tutorial I followed did not reset the exploration rate (link) but it seems like there are lots of mistakes in general.

I assume that it should be reset since the model might start from a new unknown situation in a different trial and not know what to do without exploring.


Posted 2018-05-12T22:23:09.803

Reputation: 427



The exploration rate, typically parameterized as epsilon / ε, can be changed on every trial. It depends on the complexity of the model and the goals.

The simplest thing to do is keep exploration rate high and fixed. That means the model will continue to explore new options, even at the cost of not "exploiting" the best available option.

Another option is setting the exploration rate high at the beginning of learning so the model will search the space for possible successful solutions. Then as the model creates a set of policies that are successful for given states, the exploration rate can be lowered or decay. Exploration rate decay can be fixed (i.e., over time there is consistently less exploration and more exploitation). Exploration rate decay can be dynamic and learned. This last option is often the best but also the most complex to implement.

"Dare to Discover: The Effect of the Exploration Strategy on an Agent’s Performance" goes into greater detail on this topic.

Brian Spiering

Posted 2018-05-12T22:23:09.803

Reputation: 271

Welcome to AI! Thanks for contributing!!! (Hope to see more of you on this Stack. There's an "ai-basics" tag if you want to scoop up some quick rep;) – DukeZhou – 2018-05-13T20:54:02.137