I disagree with the assertion of, "Theoretically the accuracy on training set should increase with degree". The goal of polynomial regression is not to randomly try new polynomials. The goal is to use a polynomial that better fits your data because the correlation is not linear.
Let's think about the end result of linear regression - it usually something like
y = mx + b
If you show that to a data scientist, they're going to tell you it's linear regression. You show that to a math student and they will tell you its the formula for a straight line. Either way, it's just a formula for a graph. But, note that this is for a straight line and not all data is linear. So, knowing that you're just coming up with a formula, you should think about polynomial regression in the same way - what graph am I trying to draw?
If you use a scatter plot and you are seeing a correlation but that relationship is exponential, then you should use the corresponding polynomial; same goes for all of the other variations. There is no logical explanation to use a polynomial that will not draw a graph that will closely align with your data correlation.