ValueError: Input contains NaN, infinity or a value too large for dtype('float32')



I got ValueError when predicting test data using a RandomForest model.

My code:

clf = RandomForestClassifier(n_estimators=10, max_depth=6, n_jobs=1, verbose=2), y_fit)

X_test = df_test.values  
y_pred = clf.predict(X_test)

The error:

ValueError: Input contains NaN, infinity or a value too large for dtype('float32').

How do I find the bad values in the test dataset? Also, I do not want to drop these records, can I just replace them with the mean or median?



Posted 2016-05-26T04:13:04.033

Reputation: 2 045

1You can use numpys 'np.isfinite()' to create a boolean mask and after indexing by this mask your data – till Kadabra – 2020-02-01T10:37:50.720

Check if you have taken log() of any feature with zero value in it? – DataFramed – 2020-03-04T10:17:44.950



With np.isnan(X) you get a boolean mask back with True for positions containing NaNs.

With np.where(np.isnan(X)) you get back a tuple with i, j coordinates of NaNs.

Finally, with np.nan_to_num(X) you "replace nan with zero and inf with finite numbers".

Alternatively, you can use:

  • sklearn.impute.SimpleImputer for mean / median imputation of missing values, or
  • pandas' pd.DataFrame(X).fillna(), if you need something other than filling it with zeros.


Posted 2016-05-26T04:13:04.033

Reputation: 776

I prefer identity condition for checking nan, if x!=x return None, many times np.isnan(x) had failed for me, do not remember the reason – Itachi – 2018-07-19T06:43:22.140

1It is not advisable to replace NaN values with zeros. NaN values might still have significance in being missing and imputing them with zeros is probably the worst thing you can do and the worst imputation method you use. Not only will you be introducing zeros arbitrarily which might skew your variable but 0 might not even be an acceptable value in your variables, meaning your variable might not have a true zero. – hussam – 2019-05-09T20:04:40.537

I realized that I did not provide any guidance. If you want to impute your data either use a rolling average using .rolling() to replace missing value with the mean value of a rolling window. If you want something more robust use module <b>missingpy</b> you can use MissForest for a randomforest based imputation. – hussam – 2019-05-09T21:32:44.973

Wonderfu methodsl to examine existing dataframes. In my case, when I formed my X vector, extra blank rows were added for some reason and therefore what fit thought were NaNs really should not have been in the dataframe in the first place – demongolem – 2020-04-24T15:07:54.607


For anybody happening across this, to actually modify the original:

X_test.fillna(X_train.mean(), inplace=True)

To overwrite the original:

X_test = X_test.fillna(X_train.mean())

To check if you're in a copy vs a view:



Posted 2016-05-26T04:13:04.033

Reputation: 119

2While this is true technically, it's wrong practically. You can't fill the X_test NAs with the X_test mean, because in real life you won't have the X_test mean when you're predicting a sample. You should use the X_train mean because this is the only data you actually have in hand (in 99% of the scenarios) – Omri374 – 2018-06-17T11:56:57.733


Assuming X_test is a pandas dataframe, you can use DataFrame.fillna to replace the NaN values with the mean:



Posted 2016-05-26T04:13:04.033

Reputation: 191

X_test is the numpy array. Just updated the df_test in the original question, still got the same error ... – Edamame – 2016-05-26T14:56:07.243


I faced similar problem and saw that numpy handles NaN and Inf differently.
Incase if you data has Inf, try this:

np.where(x.values >= np.finfo(np.float64).max)
Where x is my pandas Dataframe 

This will be giving a tuple of location of places where NA values are present.

Incase if your data has Nan, try this:


Prakash Vanapalli

Posted 2016-05-26T04:13:04.033

Reputation: 179


Don't forget


Which returns a boolean mask indicating np.nan values.


Which return the rows where np.nan appeared. Then by simple indexing you can flag all of your points that are np.nan.



Posted 2016-05-26T04:13:04.033

Reputation: 161


Do not forget to check for inf values as well. The only thing that worked for me:

df.fillna(df.mean(), inplace=True)

And even better if you are using sklearn

def replace_missing_value(df, number_features):

    imputer = Imputer(strategy="median")
    df_num = df[number_features]
    X = imputer.transform(df_num)
    res_def = pd.DataFrame(X, columns=df_num.columns)
    return res_def

When number_features would be an array of the number_features labels, for example:

number_features = ['median_income', 'gdp']


Posted 2016-05-26T04:13:04.033

Reputation: 141


In most cases getting rid of infinite and null values solve this problem.

get rid of infinite values.

df.replace([np.inf, -np.inf], np.nan, inplace=True)

get rid of null values the way you like, specific value such as 999, mean, or create your own function to impute missing values

df.fillna(999, inplace=True)


df.fillna(df.mean(), inplace=True)

Natheer Alabsi

Posted 2016-05-26T04:13:04.033

Reputation: 121


Here is the code for how to "Replace NaN with zero and infinity with large finite numbers." using numpy.nan_to_num.

df[:] = np.nan_to_num(df)

Also see fernando's answer.

Domi W

Posted 2016-05-26T04:13:04.033

Reputation: 121


If your values are larger than float32, try to run some scaler first. It'd be rather unusual to have deviation spanning more than float32.

Piotr Rarus

Posted 2016-05-26T04:13:04.033

Reputation: 721


You can list your columns that had NaN with this function


and then you can fill these NAN values in your dataset file. (csv or excel file)

Busra Dogan

Posted 2016-05-26T04:13:04.033

Reputation: 1