How do I predict if it is rainy or not?



I'm building a weather station, where I'm sensing temperature, humidity, air pressure, brightness, $CO_2$, but I don't have a raindrop sensor.

Is it possible to create an AI which can say if it's raining or not, with the help of the given data above and maybe analyzing the slope from the last hour or something? Which specific technology should I use and how can I train it?


Posted 2018-11-06T18:50:04.937

Reputation: 61

I think this question would be better suited for, even though, there, this question may sound too basic, given that this question is about "prediction"/"inference", in general. There you will find a lot more competent people that can answer this question. But, honestly, this is the type of questions that I would like to see more on this website, because they are actually useful.

– nbro – 2018-11-13T18:21:54.993

Anyway, just to give you an idea. What you're sensing, e.g. the temperature, the humidity, are usually called, in the context of machine learning (and AI, in general), features. In statistics, these are called variables. These variables can be independent or confounding. Anyway, the idea is to combine these variables in certain way, so as to create an output which needs to correspond to the output of the raindrop sensor. Which model to use and how to deal with your variables, it's more technical, and sometimes the choice of model, etc., requires a little domain knowledge. – nbro – 2018-11-13T18:27:18.017



This sounds like a great project, although this exact setup limits your options somewhat. Supervised machine learning approaches are effectively ruled out because you don't have the necessary training data to develop a model (i.e. the dependent variable: whether it is raining or not). You could look at accessing similar data (from what source depends on your geographical location) that includes precipitation information, and attempt to train a model based on that, before applying it to your own data to 'predict' the current conditions.

If you want a model to determine whether it's raining or not (rather than attempting to define the type, rate or other characteristics of precipitation) then you need a binary classification model. If you can get the data to train a supervised model, there are many such models available. Which one to choose depends on a number of factors – enough factors that it is as much an art as a science (and often the best way to choose is just to try several and pick the best one). XGBoost is a popular choice for a wide range of problems, including binary classification. State Vector Classifiers (SVCs, a subtype of State Vector Machine, SVM) are also common. Of course there are neural networks also. Depending on how the dependent variable correlates with the independent ones, you may find a simpler model to be more effective. If you choose to try a neural network, I suggest keeping it simple and starting with a Multi-Layer Perceptron (MLP).

Alternatively, you could try an unsupervised approach, which commonly takes the form of clustering. In this instance, you would look to try clustering algorithms, such as DBSCAN, to see if the independent variables naturally form groups (clusters) which differentiate between when it is raining or not. Of course you can cluster the data using an unsupervised model, but you would still need to then correlate that with rainfall in order to make sense of the model.

There are a number of challenges you may need overcome. For example, depending on the frequency with which the data is sampled and stored, you may need to downsample it to a manageable size. Presumably you also have the time at which the data was collected. If not, you would also need to make use of the brightness sensor to differentiate between night and day, which may confound or over-power any cooling effect from rainfall. (You could look at clustering the data, based on temperature and brightness, in order to create a new feature which indicates night and day.)

To summarise, there are numerous approaches you could take, all of which will, at some point, require data on whether it is/was raining or not in order to train a model. If you have suitably localised weather data available, you may be able to train a model using that, synchronised via time and/or the other data you'll be collecting. The problem, as stated, is a binary classification one (as opposed to a time series prediction one, i.e. weather forecasting, for example). So look into classification and clustering models for supervised and unsupervised machine learning respectively. Scikit Learn for python or Caret for R are commonly used packages for the types of models I've described.


Posted 2018-11-06T18:50:04.937

Reputation: 121

So I have to manually write that it is raining in my database, and once I got enough training data I could try it automatically with an AI? – Ribisl – 2018-11-06T20:46:51.200

Effectively, yes, if you want it to be able to declare that it is raining. Otherwise you could attempt to cluster the data in to two groups and get it to declare which group it is in. The problem is, if you wanted it to be vaguely accurate, you'd have to prepare the data, cluster it, let it run for a while, making note of whether the (somewhat arbitrarily) labeled clusters correlate to whether it's raining or not, and if it wasn't sufficiently accurate for you, re-cluster the data and see how that did. – Chris – 2018-11-06T20:50:51.830

Ok, I think I have to work with easier AI's before, I'm a complete beginner. – Ribisl – 2018-11-06T20:54:31.200

Gathering the right data, understanding it, cleaning it and performing feature engineering (generating new features, such as a label for night and day in this case) are often the largest parts of a given machine learning or AI task. I'd be happy to help you with this problem (finding data to add to the database, selecting and training a model, etc.) via chat if you'd like? Alternatively, have a look at for some good 'toy' machine learning problems to get started. – Chris – 2018-11-06T21:01:23.107

The question asked clearly for a decision tree which is used in the literature to model raining scenarios. The only problem is how the data mining process works, so that the decision tree can predict future states. – Manuel Rodriguez – 2018-11-06T21:01:32.893

@ManuelRodriguez, I'm not sure it did. The question does not mention models used in literature, decision trees or even prediction. Instead it asks for AI to classify conditions to indicate whether it is currently raining. I agree a decision tree would be well suited to the task though. – Chris – 2018-11-06T21:04:22.163

The first rule in group moderation is to never underestimate the asker. Eventually he knows more about machine learning than you and I combined. – Manuel Rodriguez – 2018-11-06T21:10:20.633

@Chris yeah that would be awesome, where can we chat? – Ribisl – 2018-11-06T21:13:22.683

@Ribisl, I've set up a public chat room here:

– Chris – 2018-11-06T21:15:08.453