## Group prediction

1

I have following sort of data coming every day:

(0)(3,4,5)(6,9,1)(5,35,12,232)

(1)(5,1,4)(6,2)(12,54,12,43)(8,23,65)

(2)(6,7,2)(34,3)

(3)(4323,23,12,4543)

(4)(987,32,324,23,224,12,213,21)(1,2)

(5)(3242,23,23434,34,324,322)(4,342,423,4)(3,1,30)

(6)(1,2,3,4,5)(6,7)(8,9,10)

(0)(1,2)

(1)(54,12)(45,21,5,19)(9,8,6,41)(432,1,431,2)

In each sequence, the first number indicates the day of a week (0-6). All other numbers indicate user IDs. The meaning of the first sequence is: On Sunday (0), following users met as 3 different groups: (3,4,5), (6,9,1), (5,35,12,232) What is the best method for predicting user groups for next day? Can I use RNN? Any specific method that I should look into? Any classical problems closely related to this?

Can you elaborate more on what these groups mean ? What domain exactly the users belong to ? Is it a college where students meet as groups or what ? – mausamsion – 2018-04-18T12:22:06.193

You can consider them as employees of a company. On Sunday (0), Employees 3, 4 and 5 have dined together. 6, 9, and 1 have dined together. 5, 35, 12 and 232 have dined together. Is that clear? – user50711 – 2018-04-18T12:26:59.907

1

RNN can be used for such tasks, But there is another non deep learning based approach which might be helpful here See Association rule learning

May I know how RNN can be used for this? Any example? – user50711 – 2018-04-18T06:32:10.140

It is not possible to explain here, how it can be done. You can google RNN and read articles that describe its structure. There are plenty of articles explaining RNN in details. – Atinesh – 2018-04-18T06:42:24.457

1

I see your problem consisting of two parts:

1. Predicting which users will participate.
2. Predicting the number of groups and what individual group composition will be.

For the first part, you can use Hidden Markov Models (HMMs) for each user id which basically model the probabilities such that given the prior knowledge of user participation ('PNNPPPN') it predicts how likely a user will participate next time. [where 'P' stands for Participated, 'N' stands for Not-participated and the sequence is a 1 week history of a particular user.]

For the second part, you can use Word2vec with which you can get vector embeddings of each user. When you visualise these vectors (in a 2-D space) you could see the clusters where the users who tend to form groups more often would be closer to each other.

So, from above two things now you know that whether a particular user will participate today or not and which other users, who have similar vector embeddings, are also likely to participate, you can predict the groups for a particular day.

(This is one approach, of-course there can be others.)

Thanks for your reply. This makes lots of sense. I'll give it a try. Waiting to see other answers. – user50711 – 2018-04-18T14:24:28.593