What is the difference between the value -99 and NaN in a data column?


I am new to data science. I was looking into some datasets and I saw some values like -99, which I discovered later that it means that there is a missing value. Does this mean the same thing as NaN? If it is the same thing, why do we use -99 instead of NaN?


Maybe, the datset values do not/ rarely turn negative - and even then the abs(num_val) is pretty small. If so, -99 is an sort of numerical identifier for 'wrong' values. Meanwhile, you can refer this wiki page.

– Continue2Learn – 2019-07-18T02:14:59.020



No, it is not the same. It may have that meaning in that particular dataframe, but don't take that as a rule.

I'd recommend you replace that magic number with actual NaNs, and then try to find the best possible way of filling up the missing values.


