From your comment, I understand that you are trying to solve the binary classification problem using your aggregated data and you are getting very poor results when you simply use the mean.
Depending on specifics of your data and the shape of your time series, there are several alternatives that you could try. Note, that you might need (significantly) more than just a single number per time series to solve your problem.
- In addition to the mean, you could use the quantiles or some other summary statistic, like standard deviation, min, or max.
- You could try to sample the data, i.e. instead of taking the entire time series, pick only the values that are minutes, hours or days a part. Or pick only mid-day values. The frequency of the sampling depends on your data.
- Or just pre-aggregate by calculating averages for every hour, day, month, etc.
- Additionally, you could calculate the periodicity of your time series and use it as a new feature.
- Or calculate some trends.
- Try to fit some standard time series models to your data, e.g. ARIMA and use the coefficients as informative features.
- Last but not least, use the domain knowledge re what could be relevant feature for your classification problem: the biggest jump (max first order difference), change of regime, etc.
I’d pick at least 10-20 features per time series generated as described above and apply logistic regression with LASSO or even xgboost.
After selecting 10-20 features per time series you also could try PCA to reduce the dimension.