Why is my training accuracy decreasing higher degrees of polynomial features?

2

I am new to Machine Learning and started solving the Titanic Survivor problem on Kaggle.

While solving the problem using Logistic Regression I used various models having polynomial features with degree $$2,3,4,5,6$$ . Theoretically the accuracy on training set should increase with degree however it started decreasing post degree $$2$$ . The graph is as per below

Welcome to the site! "Theoretically the accuracy on training set should increase with degree" - I disagree with this premise. Can you provide a citation or your rationale? I don't think this is a reasonable statement. – I_Play_With_Data – 2019-02-22T17:55:53.917

I read this in the Andrew NG course and logically speaking wouldn't the boundary fit more effectively if the degree of polynomial features increase ? – Apoorv Jain – 2019-02-22T18:07:02.220

No, not necessarily. The most common use of polynomials is when you have data that shows a correlation but isn't linear (so like an exponential curve, a parabola, etc). You can't just randomly try new polynomials, you should be trying a particular polynomial because it's better suited to the general layout of your data. – I_Play_With_Data – 2019-02-22T18:10:27.987

Could you please suggest a reading for this type of feature engineering . – Apoorv Jain – 2019-02-22T18:13:03.390

2

I disagree with the assertion of, "Theoretically the accuracy on training set should increase with degree". The goal of polynomial regression is not to randomly try new polynomials. The goal is to use a polynomial that better fits your data because the correlation is not linear.

Let's think about the end result of linear regression - it usually something like y = mx + b

If you show that to a data scientist, they're going to tell you it's linear regression. You show that to a math student and they will tell you its the formula for a straight line. Either way, it's just a formula for a graph. But, note that this is for a straight line and not all data is linear. So, knowing that you're just coming up with a formula, you should think about polynomial regression in the same way - what graph am I trying to draw?

If you use a scatter plot and you are seeing a correlation but that relationship is exponential, then you should use the corresponding polynomial; same goes for all of the other variations. There is no logical explanation to use a polynomial that will not draw a graph that will closely align with your data correlation.

Let's say my initial features were x,y hence I have degree 1 .Now lets say we come up with polynomial features of degree 2 ie x^2, y^2, xy – Apoorv Jain – 2019-02-22T18:28:39.007

1@ApoorvJain Dont start with the formula, start with your data, start with a scatterplot. What does that plot look like? What polynomial would you use to draw a similar graph? When you start thinking in those terms, then you start to think like a data scientist :-) – I_Play_With_Data – 2019-02-22T18:34:35.227

Let's say my initial features were x,y hence I have degree 1 .Now lets say we come up with polynomial features of degree 2 ie x^2, y^2, xy then we have a boundary comprising x,y,xy,x^2,y^2 hence the boundary represented with the above features would be of the form ax+by+cxy+dx^2+ey^2 hence we could anyway construct the same boundary as we could have with single degree features . Since loss function would take every possible boundary hence shouldn't our error with degree 2 <= degree 1 – Apoorv Jain – 2019-02-22T18:34:55.050

0

Have you tried normalization or doesn't your algorithm need that?

• if $$x,y < 1.0$$ then $$x^2,y^2,xy,...$$ are too small
• if $$x,y > 1.0$$ then $$x^2,y^2,xy,...$$ are too big

Many machine learning algorithms need to normalize them as they are in the same scale. $$x = (x - x_{mean})/x_{std} \\ x^2 = (x^2 - x^2_{mean})/x^2_{std} \\ .. and so on.$$

If you don't normalize them, training may be very slow or not converge.
You can utilize the sklearn.preprocessing.StandardScaler.