Using GPS signal, determine is this person driving a cab



New York City provides tens of gigs of data of taxi routes all over the city. What I'd like to do, is use this data (or some other method), to come up with an algorithm that can take a persons GPS data over a short span of time (say an hour), and answer the question: Is this person driving a cab?

The algorithm should work in any location, not just NYC. The idea is that I'd like to be able to determine patterns, that signal that a route being driven by a person is the type of route a person driving a cab would take.

Ideally, I'd like to write this in Ruby. But I am open to other suggestions, approaches, and implementations. Links to projects I should research, suggestions on languages to use, approaches to take, etc are all appreciated.


Posted 2016-04-12T20:28:52.260

Reputation: 141



Take the taxi routes and combine them with civilian car routes to form a data set for classification. Using a map (say, from Google) break down each route into a sequence of roads segments, from intersection to intersection. If you only have GPS traces this will involve spatio-temporal segmentation. (Intersections/terminuses are places where cars go but stop at, and the road segments are the places cars move through to get to from one intersection to another). Model these as categorical variables (road segment 1, 2, 3, etc); abstract out the physical location. Then train a classifier than accepts a sequence as input (e.g., a recurrent neural network). Use the time of departure as another feature, modeled as two real variables, $\cos(2*\pi*t/24), \sin(2*\pi*t/24)$, where $0<t<24$. If you have really accurate GPS information, I would also try to estimate the rate of lane crossing; taxi drivers are well known for their aggressive driving, and the rate of lane crossing would capture this well. Finally, use a pair of boolean variables to record whether the points of departure and destination are parking areas, if you can obtain this information. I imagine taxis will be more likely to start and stop at prohibited places.

The rest is your usual hyper-parameter optimization black magic.

If you want to be able to do anything besides classify a given route as taxi-driven or not, please say so.


Posted 2016-04-12T20:28:52.260

Reputation: 9 953

Adding time of the day in this way looks insightful. Can you point to other examples where it was used like this? Thanks! – Diego – 2016-04-17T16:48:16.197

This representation is standard practice.

– Emre – 2016-04-17T18:06:51.347