Which classification algorithm to choose for classifying driving patterns (GPS coordinates) and mapping them to drivers?


I have been given 6 data sets out of which 5 are training sets. The 5 data sets correspond to 5 drivers.

Each row of a data set consists of a time-stamp followed by a poly-line. The poly-line gives the GPS coordinates (latitude,longitude) of the driver taken every 15th second. So the length of this line gives the time duration of each trip. I have to classify the poly-lines corresponding to each time-stamp to one of these 5 drivers in the last (test) data set.

I need assistance in choosing a classification algorithm. I have only implemented Naive Bayes (text classification) before but I don't think it will work here. I'm having trouble visualizing the approach to solve this problem.

Paritosh Tiwari

Posted 2016-03-10T10:48:12.450

Reputation: 11

1Looks like you need to do a serious feature design first in order to be able to formulate this as a classification problem. – Diego – 2016-03-10T13:08:07.393



I have done some previous work classifying vehicles (heavy or light) based on their driving behaviors. This required calculating speed and accelerations, which you can easily do by using numerical formulas such as the five-point stencil. You already know that points are separated by 0.25 seconds and the distances can be calculated using the haversine formula. More features can be derived such as the driving range, maximum and average speeds, number of left and right turns, hard breaks and sharp accelerations. Try plotting the polylines on Google Earth (or a similar mapping tool) to see if there is a distinctive geographical pattern (are they far apart from each other, or all in the same region?). If patterns are visible, a clustering algorithm may help.

João Paulo Figueira

Posted 2016-03-10T10:48:12.450

Reputation: 31

Thanks. If I use KNN, then first I will have to form clusters of the training data set based on some characteristics or features. You gave some examples on the type of features like average speed, left and right turns, etc. How many features do I need to get an above average performance? I don't need it to be highly accurate. – Paritosh Tiwari – 2016-03-11T06:53:46.417

There is no formula for the number of features you need, you will have to try them out and check against some quality metric. One possible option is to use PCA against a large set of features and have the algorithm generate a new set of recombined features for you. If you want to explain the classification using the original set of features, then at least check for correlations between them so you can weed out correlated features that add little information. – João Paulo Figueira – 2016-03-11T09:14:08.303