## how to learn from unlabeled samples but labeled group of samples?

1

I'm trying to perform anomaly detection on the open data from citibike. They are giving bikeshare trips for the past 30+ months, as well as monthly reports. In those reports they say how many bikes have been repaired each month.

The samples I am building are a sample by day and by bike. I actually don't have label for those samples, since I don't know which bike has been repaired which day. But I know that by classifying each sample normal or anormal, I can sum the number of bikes that have been classified anormal during a month and compare that number to the monthly report.

I want to know how one usually deal with it, or how this is called so I can read research paper on the subject.

Exemple of samples :

bikeid, day,         feature1, feature2...
1,      2016-01-01,  0.6,      -0.2
2,      2016-01-01,  0.5,      -0.8
1,      2016-01-02,  0.7,      -0.1
2,      2016-01-02,  0.9,       1
...
1,      2016-01-31, -0.32,     -0.45
2,      2016-01-31, -0.5,      -0.8


Example of label: 3456 bicycle repairs in January.

But the shape of the data is irrelevant, what is important is that the labels are not about one sample but a group of samples.

It's a little unclear to me, can you give some small samples of data? – Jan van der Vegt – 2016-07-08T08:50:20.360

@Jan van der Vegt : I edited to give you samples of data but I don't care about this problem in particular, I just don't know how to describe my problem to Google – Borbag – 2016-07-08T09:00:20.660

maybe aggregation is the word you are looking for.. – Valentas – 2016-07-08T16:52:03.500

Aggregate Output Learning was the right therm, thanks @Valentas. – Borbag – 2016-07-15T09:37:42.990

http://www.cs.cmu.edu/~jfolson/documents/musicantd-AggregateLearning.pdf Here is the paper giving the definition – Borbag – 2016-07-15T09:38:17.867