Using python to identify common paths that users follow among different activities


I have a click-stream data of users regarding their behaviour in a platform. There are activities provided in a temporal order and users can go back and forth between these activities. Below is an example:

enter image description here

Here, there are 3 activities in a temporal order. Blue arrows represent one user and the green arrows represent another user. As you can see, these two users followed a different path.

In my scenario I have over 500 users , which are divided into 2 categories (let's Group A and Group B), and more than 50 activities. I want to identify the common patterns for Group A and Group B users and make a comparison among them (I am hoping to see patterns distinct to each user group).

I wonder if there are some useful (free) tools out there that I can take advantage of. I prefer Python but I could not find a python library for this. I am also looking for a tool (or python library) to visualise the paths that users follow. Any suggestions?


Posted 2017-09-19T20:37:59.963

Reputation: 111

Do you have a sample of the data that you can share? – Edmund – 2017-09-26T10:06:54.783

@Edmund thanks for your comment! The data basically has source, target, weight columns. I used sankey to visualise the paths for the user groups separately. This seems to be the best solution that I can see far. What do you think? – renakre – 2017-09-26T10:45:30.150



You can rank the paths by frequency using the database (SQL), constraining the start-, end- points as necessary once you fix the length of the window. If you let the path length be variable then you will not be able to do it all in SQL. In that case you can learn the transition probabilities between states then solve a weighted shortest path problem, where the distances are log likelihoods. Or you can use a heuristic like A* search. I do not know of any library that will do all this since it is very ad hoc, but you can do the visualization in networkx.


Posted 2017-09-19T20:37:59.963

Reputation: 9 953