You have lots of choices in how to store a policy, depending on how you have built it - using which RL algorithm, and what kind of representation for states and actions.
Tabular reinforcement learning algorithms lend themselves well to storage in a database table with an indexed
state_id column and one or both
value columns. This might be a good choice if you have a moderate sized state space, as you would avoid the need to load the whole table into memory just to compute the next move.
Whether this is feasible will depend on the complexity of your game. Even relatively simple games like checkers turn out to have too large a state space to enumerate all the states in this way.
So you are more likely to need some kind of policy function or state value function implemented using a parametric function approximator. Very often in RL this will be a neural network. In which case you would use whatever storage mechanism your neural network library supported - most will happily read and write their parameters to a file or string, allowing you a lot of flexibility on how and where to store them.
So your policy is likely to be stored in one or two files on disk as a serialised neural network. How to use that efficiently in a web service is a complex subject in its own right. You could just read the files back and instantiate the neural network each time it is needed, and this will probably be OK for a simple game and low traffic service. However, this is very inefficient.
Some neural network libraries designed around use in production will allow you to pre-load the neural network and keep it in memory between requests. How to do this depends entirely upon the frameworks you are using, so I cannot explain in more detail here. Initially I would not worry too much about this part for your project.