## Q-Learning the generic maze solution

6

2

After doing some exercices on Q-learning for maze solving, I wondered : my q-learning algorithms solve only ONE maze. The AI doesn't learn how to solve mazes, so how can I achieve it ?

For instance learn "follow one wall until the end of the maze" instead of "turn left left right..." ?

I guess it is the same approach for self driving cars ?

3Technically your q-learning algorithm can solve any maze through trial and error, but has to start learning from scratch if given a new maze. – Neil Slater – 2018-10-17T12:40:29.217

4

I'm going to assume here that you're using the standard, basic, simple variant of $$Q$$-learning that can be described as tabular $$Q$$-learning, where all of your state-action pairs for which you're learning $$Q(s, a)$$ values are represented in a tabular fashion. For example, if you have 4 actions, your $$Q(s, a)$$ values are likely represented by 4 matrices (corresponding to the 4 actions), where every matrix has the same dimensionality as your maze (I'm assuming that your maze is a grid of discrete cells here).

With such an approach, you are learning $$Q$$ values separately for every single individual state (+ action). Such learned values will always only be valid for one particular maze (the one you have been training in), as you seem to have already noticed. This is a direct consequence of the fact that you're learning individual values for specific state-action pairs. The things you are learning ($$Q$$ values) can therefore not directly be transferred to a different maze; those particular states from the first maze do not even exist in the second maze!

Better results may be achievable with different state representations. For example, instead of representing states by their coordinates in a grid (as you would likely do with a tabular approach), you'd want to describe states in a more general way. For example, a state could be described by features such as:

• Is there a wall right in front of me?
• Is there a wall immediately to my right?
• Is there a wall immediately to my left?

An alternative that also may actually be able to better generalize to some extent could be pixel-based inputs if you have images (a top-down image or even a first-person-view).

When states are represented by such features, you can no longer use the tabular RL algorithms that you are likely familiar with though. You'd have to use function approximation instead. With good state-representations, those techniques may have a chance of generalizing. You'd probably want to make sure to actually also use a variety of different mazes during the training process though, otherwise they'd likely still overfit to only a single maze used in training.