Let's say I have a top-down picture of an arrow, and I want to predict the angle this arrow makes. This would be between $0$ and $360$ degrees, or between $0$ and $2\pi$. The problem is that this target is circular, $0$ and $360$ degrees are exactly the same which is an invariance I would like to incorporate in my target, which should help generalization significantly (this is my assumption). The problem is that I don't see a clean way of solving this, are there any papers that try to tackle this problem (or similar ones)? I do have some ideas with their potential downsides:
Use a sigmoid or tanh activation, scale it to the ($0, 2\pi)$ range and incorporate the circular property in the loss function. I think this will fail fairly hard, because if it's on the border (worst prediction) only a tiny bit of noise will push the weights to go one way or the other. Also, values closer to the border of $0$ and $2\pi$ will be more difficult to reach because the absolute pre-activation value will need to be close to infinite.
Regress to two values, a $x$ and $y$ value and calculate the loss based on the angle these two values make. I think this one has more potential but the norm of this vector is unbounded, which could lead to numeric instability and could lead to blow ups or going to 0 during training. This could potentially be solved by using some weird regularizer to prevent this norm from going too far away from 1.
Other options would be doing something with sine and cosine functions but I feel like the fact that multiple pre-activations map to the same output will also make optimization and generalizations very difficult.