Action selection in actor-critic algorithm:


I have an action space that is just a list of values given by acts = [i for i in range(10, 100, 10)]. According to pytorch documentary, the loss is calculated as below. Could someone explain to me how I can modify this procedure to sample actions from my action space?

   m = Categorical(probs)
   action = m.sample()
   next_state, reward = env.step(action)
   loss = -m.log_prob(action) * reward


Posted 2020-03-30T12:57:39.447

Reputation: 45

No answers