Is it a good idea to store the policy in a database?



I'm a beginner in ML and have been researching RL quite a bit recently. I'm planning to create an RL application to play a zero-sum game. This will be web-based, so anyone can play it.

I wondered if I need to create a database (or some other kind of storage) to store the policy the RL algorithm is updating, so that it can be used by the application when the next human user comes along to play against the application?


Posted 2019-08-03T21:44:58.360

Reputation: 489



You have lots of choices in how to store a policy, depending on how you have built it - using which RL algorithm, and what kind of representation for states and actions.

Tabular reinforcement learning algorithms lend themselves well to storage in a database table with an indexed state_id column and one or both action and value columns. This might be a good choice if you have a moderate sized state space, as you would avoid the need to load the whole table into memory just to compute the next move.

Whether this is feasible will depend on the complexity of your game. Even relatively simple games like checkers turn out to have too large a state space to enumerate all the states in this way.

So you are more likely to need some kind of policy function or state value function implemented using a parametric function approximator. Very often in RL this will be a neural network. In which case you would use whatever storage mechanism your neural network library supported - most will happily read and write their parameters to a file or string, allowing you a lot of flexibility on how and where to store them.

So your policy is likely to be stored in one or two files on disk as a serialised neural network. How to use that efficiently in a web service is a complex subject in its own right. You could just read the files back and instantiate the neural network each time it is needed, and this will probably be OK for a simple game and low traffic service. However, this is very inefficient.

Some neural network libraries designed around use in production will allow you to pre-load the neural network and keep it in memory between requests. How to do this depends entirely upon the frameworks you are using, so I cannot explain in more detail here. Initially I would not worry too much about this part for your project.

Neil Slater

Posted 2019-08-03T21:44:58.360

Reputation: 14 632

Thank you very much for your reply; it's very helpful. In your opinion what's (how would I guestimate?) a small/moderate/large sized state space?

I was thinking of creating the app with Python; can you suggest a NN library which supports a good storage mechanism? – mason7663 – 2019-08-04T20:39:42.950

1@mason7663: Follow the link in the answer for guides on state spaces. Anything over a billion states ($10^9$) is going to be too much to try and handle directly in tabular form IMO. All the NN libraries in Python will allow you to store the parameters after training, so just pick one for any other reason that you prefer. – Neil Slater – 2019-08-04T21:44:28.233

Thank you for your help. I was pondering over this the past few days and wondered, "do I actually need to store the policy, or are there RL techniques which mean I don't need to store a policy?" Any help would be fantastic. – mason7663 – 2019-08-10T13:26:40.890

1@mason7663: That sounds like a different question. Feel free to ask it on the site. – Neil Slater – 2019-08-10T15:25:54.190