Maybe there's another way to go. The idea would be to generate the dataset you will be processing with your algorithms. You define the models of behaviour of events (those you're looking for and those into which they are hiden). Then generate the data, then analyse.
This approach has the benefit to let you control exactly what is inside the processed data. And make sure your algorithm identifies exactly what it is supposed to identify, no more, no less.
With GEDIS Studio we model events behaviours with activity profiles and the generator produces those events. We have implemented generators for telecom CDR, credit card usages, smart metering, etc.
They are freely available online from the evaluation account on http://www.data-generator.com
Check the detailed CDR use case at http://www.gedis-studio.com/online-call-detail-records-cdr-generator.html