Feature engineering is the name of the game when it comes to this cases. I stumbled upon a similar problem a few years ago and it can be baffling to have a model nor generalize well for all cases. However, one model for one user is never the way to go, after all you, in many cases have only one data point for that user in particular. Additionally, you will never be able to make predictions if a new user appears.
Therefore, you need to use one model for as many as possible users. There are many more approaches but these are the ones worth considering in my opinion.
- You can cluster users by location, city, neighborhood, region. Usually people in the same areas frequent the same places. If one model is incapable of generalization for all possible users then splitting the data into similar clusters is an approach.
- In the case a model can generalize well for most cases but there is a big cluster when it cannot generalize, use a second model for the hard cases.
- Basic feature engineering: users living nearby are going to go to similar places based on their current location and are going to use the same ways. There is a lot of information similar users will share. Use starting location as a feature as well.
- Advanced feature engineering: Use old targets from the same users along with time features in a meaningful way as training features to predict future targets.
- Use users ids: user ids can be used to predict targets in modern machine learning algorithms, sparse matrices do the trick, think of them as NLP words. Some might be useful others don't but let the algorithm do its work. Search for algorithms that support sparse matrices if you take this approach.
- Research GIS (Geographic Information System) forecast: there are other tricks and feature engineering methods in that field that are not applicable for other ML problems which are useful for forecasting these kind of problems.
- Try different targets. Longitude - -latitude is often not possible but a raw estimate of the region can be done. You can treat the targets as a multilabel preferably (not multiclass) classification with each region/neighborhood/block be one label. Remember, algorithms can handle many targets like in the case of image classification, research those approaches.
- Consider simpler methods if you don't have enough data: If user Bob goes to spot A every Monday then probably Bob will be at A the next Monday as well.
- Use dates/days of the week/holidays as features. People change transit behaviors during holidays.
tl.dr. Find a way to use many users for a model that generalizes well because training one model for one person never works due to users having often just one point of data. Because, all algorithms need massive amounts of data to generalize well.