How can I predict student enrolment in September based on independent data available earlier in the year?


I'd like to predict how many students will enroll in a college in September, based on independent variables which are known earlier in the annual student recruitment cycle.

For example, one independent variable would be the number of accepted offers; I know how many students have accepted an offer from the college for each week from January to August for the last five years, and I know how many of these students actually enrolled in September (generally about 80% of students who accept an offer actually enroll, but this varies year-on-year).

Other independent variables might include: how many students have already paid a deposit on their tuition fee, or even the current exchange rate for students travelling from abroad.

So far, I've used historical conversion rates to predict enrollment, but I am wondering whether I can do something more sophisticated which would allow me to add in other independent variables.

Any ideas of which approaches I should investigate would be very welcome. I'd be particularly interested to hear of any Python libraries which might be relevant.


I would try out regression in Python's scikit-learn library to predict the September headcount given all those other variables you have. Here is a basic example using a linear model. Once you have that working, you could try a more sophisticated algorithm.


Thanks - so would I train the model on one of my independent vars e.g. X = week number, y= accepted offers and forecast on week number + n? I can see how this would forecast accepted offers, but not enrolment... I was thinking I'd like to make an 'explanatory model', as described in the 'Predictor variables and time series forecasting' section...

– user2950747 – 2017-06-21T15:28:16.067

– CalZ – 2017-06-22T16:02:38.773

– user2950747 – 2017-06-23T12:33:30.607


Y would be final enrollment and X would be every other variable for a multivariate linear regression. I sent you a univariate one, so maybe that was confusing. Here's another example:

