## Find frequent segments of GPS trajectories

2

I have a dataset containing GPS coordinates (latitude and longitude), a timestamp variable and a subject ID to identify different persons. What I want to explore is two things:

1. Find frequent routes (or segments of route) that are shared among the subjects, for example people going from different places where their home is located and go to work taking the main highway.
2. Find related subjects. Related to the previous one, which are the subjects that are similar (taking into account the time they use to commute, for example)

I looked at Dynamic Time Warping as shown here but it doesn't take into account that for example, subjects can come from different points (but share a segment).

Tried to implement K-Means, but I'm not sure how my centroids are defined (if I should take into account the time or not) as discussed in here, not sure if the similarity distance is making the correct assumptions in this case.

I looked at some papers on the mater but often they work with pre-defined points A and B and try to cluster, however they hint on some good strategies to transform the data.

I know I'm missing the name of the concept of what I'm looking for, maybe you can guide me on the algorithm/papers to review.

Thanks in advance.

## Answers

0

This problem is solved.

Frequent routes are time series motifs in 2D space. There are exact algorithms to find motifs in massive data sets [a]

0

With the initial assumption that the trajectories can follow existing routes that a number of all possible sub-trajectories is limited and can be also discreet. I would discretise the map and on all sub-paths compter a frequency analyses:

1. Create a map of all possible paths such that you overlay all trajectories which gives you the full map.
2. Dicretise the map by creating paths and path segments defined that each path segment is a path between two closest branching on the particular path, you can also see this task as generating a graph.
3. Represent each trajectory as sequence of this path segments and compute a frequency characteristic for each segment in the map
4. You can extend this task with discretization levels such that some short by-passes could be still counted as the main path...