Ways to deal with longitude/latitude feature

26

17

I am working on a fictional dataset with 25 features. Two of the features are latitude and longitude of a place and others are pH values, elevation, windSpeed etc with varying ranges. I can perform normalization on the other features but how do I approach latitude/longitude features?

Edit: This is a problem to predict agriculture yield. I would think lat/long is very important since locations can be vital in prediction and hence the dilemma.

AllThingsScience

Posted 2016-08-20T06:51:26.563

Reputation: 373

Question was closed 2016-08-20T17:20:30.537

Could you clarify why you don't think that you can normalise those features? Presumably they are numerical the same as other features, so you can take mean/sd? Is your concern about having natural measure of distance between locations? If so, does the data cover a small area (with similar values) or is it global? – Neil Slater – 2016-08-20T07:13:11.417

@NeilSlater It's just that intuitively it does not make sense to me to normalize these features. Will the information not be lost if normalized? I have the dataset covering counties of America. – AllThingsScience – 2016-08-20T08:15:44.647

What information do you think will be lost? It probably will not be actually lost, but if you explain in your question what your concern is, someone will be able to answer. Not knowing any more, I would just normalise regardless - for fully global values and some problems (where distance between points is important) I might create a 3d cartesian co-ordinates feature from the long/lat. – Neil Slater – 2016-08-20T08:20:41.110

What's your question here? What are you trying to find out from the data? Correlation? Clustering? Classification? Prediction? Interpolation? How is location important to your model? – Spacedman – 2016-08-20T12:46:19.147

@Spacedman Please see edit. – AllThingsScience – 2016-08-20T18:58:30.223

So maybe you want a regression model with a spatial surface defined by a Gaussian field, or a 2-dimensional parametric spline surface or something like that? Are you just hoping to feed the numbers into an ML algorithm or Random Forests or something? – Spacedman – 2016-08-21T14:33:13.153

@Spacedman I am still working on the feature engineering part but yes I am trying to build a regression model. – AllThingsScience – 2016-08-23T15:44:22.083

Answers

28

Lat long coordinates have a problem that they are 2 features that represent a three dimensional space. This means that the long coordinate goes all around, which means the two most extreme values are actually very close together. I've dealt with this problem a few times and what I do in this case is map them to x, y and z coordinates. This means close points in these 3 dimensions are also close in reality. Depending on the use case you can disregard the changes in height and map them to a perfect sphere. These features can then be standardized properly.

To clarify (summarised from the comments):

x = cos(lat) * cos(lon)
y = cos(lat) * sin(lon), 
z = sin(lat) 

Jan van der Vegt

Posted 2016-08-20T06:51:26.563

Reputation: 8 538

2That is very interesting. Thank you! Could you confirm if these are the formulas for conversion? x = R * cos(lat) * cos(lon), y = R * cos(lat) * sin(lon), z = R *sin(lat) – AllThingsScience – 2016-08-20T19:01:51.217

1I don't have access to my code at the moment but it looks right. You don't need the R since you will be standardizing anyway ;) – Jan van der Vegt – 2016-08-20T19:07:29.107

Perfect! Thank you. – AllThingsScience – 2016-08-20T19:46:50.603