How to use the fillna method in a for loop

0

I am working on a housing dataset. In a list of columns (Garage, Fireplace, etc), I have values called NA which just means that the particular house in question does not have that feature (Garage, Fireplace). It doesn't mean that the value is missing/unknown. However, Python interprets this as NaN, which is wrong. To come across this, I want to replace this value NA with XX to help Python distinguish it from NaN values. Because there is a whole list of them, I want use a for loop to accomplish this in a few lines of code:

na_data = ['Alley', 'BsmtQual', 'BsmtCond', 'BsmtExposure', 'BsmtFinType1', 'BsmtFinType2', 'FireplaceQu', 'GarageType',
           'GarageFinish', 'GarageQual', 'GarageCond', 'PoolQC', 'Fence', 'MiscFeature']

for i in range(len(na_data)):
    train[i] = train[i].fillna('XX')

I know this isn't the correct way of doing it as it is giving me a KeyError: 0. This is kinda like a pseudocode way of doing it to visualize what I'm trying to accomplish. What is the way to automate fillna('XX') on this list of columns?

Andros Adrianopolos

Posted 2019-07-03T06:18:58.273

Reputation: 322

Answers

2

what you are looking for is replace().

And you don't need to write all the columns you can iterate over the columns name simply.

for col in train:
    train[col].replace("NA","XX",inplace=True)

You can do it on all the dataset in one line:

train.replace("NA","XX", inplace=True)

Or on specific columns:

for cols in na_data:
    train[col].replace("NA","XX",inplace=True)

vico

Posted 2019-07-03T06:18:58.273

Reputation: 138

Yea but its not every column in train. It is a specific list of columns organized in na_data – Andros Adrianopolos – 2019-07-03T08:12:36.273

I edited my response – vico – 2019-07-03T08:16:04.373

How would you do this for two instances (strings) e.g. replace "Unknown" with "Nan" and also replace "Not Found" with "Nan"? – TokyoToo – 2020-12-19T07:28:27.803

1

While replace is a valid approach, it can be inefficient and slow on a large scale - see this question.

You should instead use map to encode NA as XX - perhaps something like this:

na_data = ['Alley', ...,'Fence', 'MiscFeature']
for col in na_data:
   train[col]= train[col].map({'NA':'XX'})

bradS

Posted 2019-07-03T06:18:58.273

Reputation: 1 387

How would you do this for two instances (strings) e.g. replace "Unknown" with "Nan" and also replace "Not Found" with "Nan"? – TokyoToo – 2020-12-19T07:28:32.447

You could extend the dict to include other instances - e.g. { 'Unknown':'Nan','Not Found':'Nan'}. Beware of behaviour when constructing your dict - any key/value pairs excluded in the dict will be mapped to NaN. – bradS – 2020-12-19T13:19:14.563