How can I predict student enrolment in September based on independent data available earlier in the year?


I'd like to predict how many students will enroll in a college in September, based on independent variables which are known earlier in the annual student recruitment cycle.

For example, one independent variable would be the number of accepted offers; I know how many students have accepted an offer from the college for each week from January to August for the last five years, and I know how many of these students actually enrolled in September (generally about 80% of students who accept an offer actually enroll, but this varies year-on-year).

Other independent variables might include: how many students have already paid a deposit on their tuition fee, or even the current exchange rate for students travelling from abroad.

So far, I've used historical conversion rates to predict enrollment, but I am wondering whether I can do something more sophisticated which would allow me to add in other independent variables.

Any ideas of which approaches I should investigate would be very welcome. I'd be particularly interested to hear of any Python libraries which might be relevant.


Posted 2017-06-21T09:06:25.870

Reputation: 131



I would try out regression in Python's scikit-learn library to predict the September headcount given all those other variables you have. Here is a basic example using a linear model. Once you have that working, you could try a more sophisticated algorithm.


Posted 2017-06-21T09:06:25.870

Reputation: 1 548

Thanks - so would I train the model on one of my independent vars e.g. X = week number, y= accepted offers and forecast on week number + n? I can see how this would forecast accepted offers, but not enrolment... I was thinking I'd like to make an 'explanatory model', as described in the 'Predictor variables and time series forecasting' section...

– user2950747 – 2017-06-21T15:28:16.067

No, I was saying you would tell it to forecast enrollment numbers given all the other variables. Are you asking if you could also predict if an individual student would enroll? – CalZ – 2017-06-22T16:02:38.773

No need to predict individual students – I just need a forecast of overall enrolment. What would the X and y be if I followed the linear model for just one variable (say accepted offers)? Do you mean I should train the model so X = week by week number of offers and y = final enrolment? Thanks for bearing with me! – user2950747 – 2017-06-23T12:33:30.607


Y would be final enrollment and X would be every other variable for a multivariate linear regression. I sent you a univariate one, so maybe that was confusing. Here's another example:

– CalZ – 2017-06-23T13:41:33.043