Detect Missing Records in Dataset



I have a dataset that contains several measures from various widgets on a daily basis. While the widgets remain relatively stable over time, sometimes there are legitimate reasons for one to disappear and another to appear in the data as a whole. Occasionally, a widget will just disappear and so the dataset is incomplete, invalidating the whole dataset for that day.

What I am looking for is a method of comparing the current set of widgets with another set of widgets to detect if any widgets are missing. I am not trying to create the values, just identify that they are missing. I could do time-series, but that feels like overkill on so many widgets and there are multiple attributes on which data might be missing. I was hoping for something more set based that might account for the regular changes in widgets but detecting the unusual dropouts. I am sure I just need to adjust the way I am thinking about the problem.

Any ideas would be much appreciated?


Posted 2018-11-14T20:26:07.553

Reputation: 898

1concept is good and any sample data? – sai saran – 2018-11-15T03:49:07.460

1Unfortunately, it is proprietary data, but I'll try to document a proxy of the data. – Skiddles – 2018-11-15T14:29:35.233

1Something I don't get: what happens to your data when a widget "disappears"? – anymous.asker – 2018-11-17T18:47:33.177

1When the widgets disappear, part of the whole is missing. – Skiddles – 2018-11-18T01:25:12.140

No answers