2

What if there are multiple goals? For example, let's consider Bit-flipping environment as described in the paper HER with one small change: Now, goal is not some specific configuration, but let's say for last m bits (e.g m=2) I do not really care if there is 1 or 0.

In the paper, there is section 3.2 Multi-goal RL, where they mention example with two-dimensional coordinates (x and y), but they are interested only in x coordinate, so they use only x coordinate as a goal. Applying this strategy to my example would result in cutting last m bits from goal and only use the other bits, is this logic correct?

Another approach I could think of would be to train with all possible goals configurations, as there are not many in my case. But this seems impractical as the number of goals configurations grows.