How can one generate future forecasts from probabilistic events?



I have an event "whether an item sold will be returned or not" which I can predict with a certain probability based on information gathered at the time that the purchase occurs (product features, customer information, time and place, etc). So:

P(Return | transaction information) = x% for a specific unit sold

I also have a historical time series of total units sold for that item, and a future forecast of sales of that item over the next few weeks. Assuming I gave the relevant transaction data for each historical sale that occurred, is there away to generate a future forecast from the return probability, so that I can state with some confidence that I will get 15% total returns on the item next week, 10% the week after etc?

Alex S Kinman

Posted 2016-11-14T20:54:07.970

Reputation: 477

I cannot comment because of low reputation, but how about forecast package in R ARIMA models ? – user4959 – 2017-04-15T05:53:36.213

I fail to understand how the number of items sold as a group will impact the probability of individual items being returned? The product features won't change, perhaps your forecast will contain the time and place of purchase, but what about the customer information? – Valentin Calomme – 2017-11-13T21:20:26.140


An entire branch of statistics is devoted to this kind of problem: survival analysis (

– Elias Strehle – 2018-02-09T11:05:27.637



If you have historical data on percentage of return for few weeks after the sales, I believe you can do it. It will be a multiple-output regression problem e.g. you will have to formulate the target values (i.e. output matrix) in a way so that %-return in first week will be in column one, %-return in second week will be in columns two and so on. After this, plug in all your input variables and run a multivariate multiple output regression algorithm. Such an algorithm on scikit-learn can be found (sklearn.multioutput.MultiOutputRegressor: here)


Posted 2016-11-14T20:54:07.970

Reputation: 256


You may wish to do some data aggregation.
for each week and for each item unit, aggregate all of its records, such that you could define how many are returned, also you need to expose transactions,in such a way it could infer those numbers of returns, following that formulate the problem as time series, you have time steps defined as weeks, preprocess the data by computing the relative value of return units instead of absolute ones, apply z-score normalization, then auto-encode the data by feeding it to RNN like models to predict the value at timestep t+1 given previous t timesteps.

Fadi Bakoura

Posted 2016-11-14T20:54:07.970

Reputation: 848


One potential idea might be to approach this as a classification problem: manually label returns and train a Logistic Regression with a probability output. Now you can give a probabilistic answer.


Posted 2016-11-14T20:54:07.970

Reputation: 901