What is the credit assignment problem?

9

3

In reinforcement learning (RL), the credit assignment problem (CAP) seems to be an important problem. What is the CAP? Why is it relevant to RL?

14

In reinforcement learning (RL), an agent interacts with an environment in time steps. On each time step, the agent takes an action in a certain state and the environment emits a percept or perception, which is composed of a reward and an observation, which, in the case of fully-observable MDPs, is the next state (of the environment and the agent). The goal of the agent is to maximise the reward in the long run.

The (temporal) credit assignment problem (CAP) (discussed in Steps Toward Artificial Intelligence by Marvin Minsky in 1961) is the problem of determining the actions that lead to a certain outcome.

For example, in football, at each second, each football player takes an action. In this context, an action can e.g. be "pass the ball", "dribbe", "run" or "shoot the ball". At the end of the football match, the outcome can either be a victory, a loss or a tie. After the match, the coach talks to the players and analyses the match and the performance of each player. He discusses the contribution of each player to the result of the match. The problem of determinig the contribution of each player to the result of the match is the (temporal) credit assignment problem.

How is this related to RL? In order to maximise the reward in the long run, the agent needs to determine which actions will lead to such outcome, which is essentially the temporal CAP.

Why is it called credit assignment problem? In this context, the word credit is a synonym for value. In RL, an action that leads to a higher final cumulative reward should have more value (so more "credit" should be assigned to it) than an action that leads to a lower final reward.

Why is the CAP relevant to RL? Most RL agents attempt to solve the CAP. For example, a $$Q$$-learning agent attempts to learn an (optimal) value function. To do so, it needs to determine the actions that will lead to the highest value in each state.

There are a few variations of the (temporal) CAP problem. For example, the structural CAP, that is, the problem of assigning credit to each structural component (which might contribute to the final outcome) of the system.