How to use hours of the day as a continuous feature?

4

3

I would like to use the hour of the day (0-23) as a continuous feature, so the model will know that 12pm comes before 13:00, and that 8:00 is further from 21:00 than from 10:00. How do I engineer this feature so it will also understand that 0:00 comes after 23:00?

shakedzy

Posted 2018-01-14T19:51:30.243

Reputation: 639

Answers

5

Great question.

I'd say you should either use fourier transforms or sine / cosine transforms as they should get 0 to be right next to 23. The below post is talking about transforming day features, but should be easily applicable here.

Transforming day features

Transforming hour features

plumbus_bouquet

Posted 2018-01-14T19:51:30.243

Reputation: 330

Thanks :) but, won't the FT will cause different hours to have the same value? If the transform is f(x) -> X, then there could be several x with the same X – shakedzy – 2018-01-14T22:57:19.833

replying to myself - maybe I'll use the FT and the derivative sign as a second feature to distinguish the ambiguous options.. – shakedzy – 2018-01-14T22:58:50.003

@shakedzy - check my update. I'm not sure about the FT, but the sin / cos transform seems to be the way to go – plumbus_bouquet – 2018-01-14T23:16:04.730

1plumbus_bouquet - a FT is a combination of sin and cos as real and imaginary parts :) – shakedzy – 2018-01-15T08:32:01.120

1

Notice that if you want to establish an order relation that is cyclic you end up with a contradiction. In fact, you get: 0:00 < 23:59 < 0:00.

If you just want to compute distances and you give up with continuity you might proceed as follows.

First, you transform the domain hh:mm in minutes and then linearly map into an integer number between 0 and $2^k-1$. The transformation function might be $$f(h,m)=\mathrm{round}\left(\frac{60h+m}{60\cdot23+59}\times (2^k-1)\right)$$ You can choose $k=10$. It may be better to use less bits than those required to exactly represent time (exact time requires 11 bit if seconds are not used, but not all $2^{11}$ configurations are used).

Then, you encode $f(h,m)$ with a Gray code. Gray code assure that consecutive numbers are encoded in a way that at most one bit is changed; also, the code is cyclic in the sense that the encoding of $0$ and the encoding of $2^k-1$ differ of one bit only.

Finally, you can evaluate the distance between two time instants through bitwise Hamming distance, which counts the number of different bit between two codes.

If you want more precision, you might use seconds accordingly.

Corrado

Posted 2018-01-14T19:51:30.243

Reputation: 21

Great idea! About the contradiction you mentioned, I don't think of it this way - think of a circle, where 0:00 is the starting point. That makes 0:00 be 0 degrees, but also 360 degrees, so it's ok – shakedzy – 2018-01-19T12:48:10.107

Actually you have 360 degrees, from 0 to 359. 360 is equivalent to 0, because the representation is modular. Unfortunately contradiction is not contradicted :) – Corrado – 2018-01-19T15:22:52.093