Clustering/ Classifying users based on sequence of action and time



I have some user data where each user has certain pattern of being at different places for some time. I want to create a model which will cluster/classify these users based on these patterns and time spent at each place. So suppose user patterns are like-

Place_1(60 min)- Place_2(30 min)- Place_5(45 min)- user 1 -label(1)

Place_1(60 min)- Place_2(60 min)- Place_5(45 min)- user 2 -label(2)

Place_1(60 min)- Place_2(60 min)- Place_5(40 min)- user 3 -label(2)

Place_2(60 min)- Place_1(60 min)- Place_5(45 min)- user 4 -label(3)

Place_2(60 min)- Place_1(60 min)- Place_5(45 min)- user 5 -label(3)

So they should be clustered/Classified as as-

1- User 1

2- User 2, User 3

3- User 4, User 5

The time duration is continuous, also I already have labels for these patterns so I can do classification as well as clustering. I initially thought of doing kmeans clustering on these patterns but by introducing the duration of stay at each place is messing the clustering up. I am currently using Random forest classifier but results are not as promising. Any help will be highly appreciated.

Y0gesh Gupta

Posted 2018-05-18T15:06:04.353

You could use RNN auto-encoder, hence you could use embedding layer to transform the categorical place input into a continuous dense vector, If you want to do classification just use RNN and augment the last hidden state with a linear classifier – Fadi Bakoura – 2018-05-18T17:32:29.460

Thanks, I was refraining from using neural network for this. My stack currently involves spark mllib with scala for implementing the model. I am not limiting myself to it and is certainly open to moving to neural network if thats the only plausible option. – Y0gesh Gupta – 2018-05-18T18:41:23.277

if you are encoding(places as columns; time data in rows) your data properly, i don't see any problem with k means as well as randomforest. elaborate more on your problem – Mankind_008 – 2018-05-18T20:09:09.327

Can you please tell me what you mean by places in column and time data in column. Currently I have user record as a row where the sequence of places and time spent at the place is maintained. I cant have a concept of places with fixed column as they can come up in any order for each user. – Y0gesh Gupta – 2018-05-20T20:11:58.757

