Spatial Data Mining and ATM banking transactions

0

I have the following dataset. This dataset contains ATM banking transactions in different places over 30 days. These ATMs are located in different cities and provinces. The problem that I need to solve is to develop a model that can be able to detect which ATMs are in the same city or in the same province. The only thing I know about this problem is that is related to Spatial Data Mining but I have no idea which techniques, algorithms, and tools I should use to solve this problem.

In this dataset PS ID column corresponds to the ATM IDs, AMOUNT column corresponds to the amount of money transferred.

My question is which techniques, algorithms, and tools I should use to solve this problem???

Should I use clustering or I should use other techniques?

enter image description here

iMuhammad

Posted 2016-11-18T18:40:14.540

Reputation: 1

Answers

1

This sounds like an unsupervised learning problem since you are trying to group observations according to some common association rather than trying to predict a target. My first impression is that you are facing an uphill battle here as your data set doesn't look comprehensive enough. As a starting point, this is what I would try:

You could assume that ATMs in the same time zone would exhibit similar seasonality patterns on both daily and weekly scales. For example, on a typical weekday, there would probably be a consistent pattern of an up-tick in usage around say 8am, probably peaking during the lunch break with another peak following the end of the workday (these are just assumptions, I haven't done any analysis on this). So you could convert your TIME column into a factor with 24 levels (one for each hour of the day), but you would need to ensure that the times reported are in some consistent time zone, as opposed to the local time of the ATM from which they come which would of course not make much sense.

That approach may allow you to cluster on the basis of time zone, but what about city? Well, if you assume that different cities have different levels of affluence, the AMOUNT column might yield some useful associations.

So in summary, I'd try a clustering algorithm based on seasonal usage patterns and the value of the transactions. I'd be interested to hear if you can get any useful results from this as my feeling is that your data set probably doesn't have enough useful features to be effective.

ManChild

Posted 2016-11-18T18:40:14.540

Reputation: 11