What does anneal mean in the context of machine learning?


An article released by Open AI gives an overview of how Open AI Five works. There is a paragraph in the article stating:

Our agent is trained to maximize the exponentially decayed sum of future rewards, weighted by an exponential decay factor called γ. During the latest training run of OpenAI Five, we annealed γ from 0.998 (valuing future rewards with a half-life of 46 seconds) to 0.9997 (valuing future rewards with a half-life of five minutes).

Does annealing in this context mean the network found through training that γ was better as 0.9997? How would this be determined?

My limited understanding of the topic led me to the following assumption on how γ was annealed: Different versions of the network were trained for a given amount of time using different versions of γ. Then those different versions of the network played against each other or their true skill scored were compared to determine the ideal value of γ.

Reuben Walker

Posted 2020-05-24T22:24:11.650

Reputation: 11



Annealing is short for simulated annealing. Simulated annealing is the process of slowly decreasing the probability of accepting worse solutions as the solution space is explored. Over the course of the experiment, γ value was slowly lowered to balance exploration and exploitation. γ is a machine learning hyperparameter so any hyperparameter search method would work (e.g., manually selection or cross-validation).

Brian Spiering

Posted 2020-05-24T22:24:11.650

Reputation: 10 864