Using historical label as a feature in my ML model?


I am working on a predictive model to predict change in the price of an asset (up, down, no change). The labeling is based on the derivative of the price and is exponentially smoothed with an alpha of 0.1 and so basically when the rate of change is above a certain threshold it gets a label 1, if below -1, and if not then 0.

Because I am using an exponentially smoothed moving average for the label, am I still allowed to use historical labels as a feature in my dataset? So for example if my predictive variable is the direction for t+5, could I use label at point t? Because there is an overlap in the data used

Dick Thompson

Posted 2018-08-13T17:40:12.617

Reputation: 131



Correct me if I'm wrong, but it sounds like you are using an EMA to generate the price that you will be using to label as either 1,0, or -1. If so, that is not a problem to do so with historical data, as EMA calculations are only affected by data previous to whatever record the EMA is being based off of. Just make sure that your historical data is calculated as if it was the present moment for every record.


Posted 2018-08-13T17:40:12.617

Reputation: 179

yeah but the thing is if I'm predicting the EMA for t+5 and I'm using the EMA at point t, isn't there an overlap in the historical data used? is that ok? – Dick Thompson – 2018-08-14T18:19:53.973

I'm not sure I understand the problem. You can use historical data all the way up until point T+5 to predict what T+5 should be. What you cannot do is use any data that is present in T+6 or beyond. Always frame the question this way: if I were to use my model to predict something right now, what information would be available to me? Use only that information. – stefanLopez – 2018-08-15T02:00:29.663

exactly so I can only use information at point t (now) if I'm predicting t+5 (5 days from now). So to confirm you don't believe it's an issue if there is an overlap within the moving average (the moving average 5 days from now will incorporate some of the same data thats in the current moving average) – Dick Thompson – 2018-08-15T16:33:33.520

No that's not an issue. Because of the nature of EMAs they will always be bounded to some extent to whatever the data is previously – stefanLopez – 2018-08-15T20:35:53.633