The following explanation is based on `fit_transform`

of `Imputer`

class, but the idea is the same for `fit_transform`

of other scikit_learn classes like `MinMaxScaler`

.

`transform`

replaces the missing values with a number. By default this number is the means of columns of some data that you choose.
Consider the following example:

```
imp = Imputer()
# calculating the means
imp.fit([
[1, 3],
[np.nan, 2],
[8, 5.5]
])
```

Now the imputer have learned to use a mean (1+8)/2 = 4.5 for the first column and mean (2+3+5.5)/3 = 3.5 for the second column when it gets applied to a two-column data:

```
X = [[np.nan, 11],
[4, np.nan],
[8, 2],
[np.nan, 1]]
print(imp.transform(X))
```

we get

```
[[4.5, 11],
[4, 3.5],
[8, 2],
[4.5, 1]]
```

So by `fit`

the imputer calculates the means of columns from some data, and by `transform`

it applies those means to some data (which is just replacing missing values with the means). If both these data are the same (i.e. the data for calculating the means and the data that means are applied to) you can use `fit_transform`

which is basically a `fit`

followed by a `transform`

.

Now your questions:

Why we might need to transform data?

"For various reasons, many real world datasets contain missing values, often encoded as blanks, NaNs or other placeholders. Such datasets however are incompatible with scikit-learn estimators which assume that all values in an array are numerical" (source)

What does it mean fitting model on training data and transforming to test data?

The `fit`

of an imputer has nothing to do with `fit`

used in model fitting.
So using imputer's `fit`

on training data just calculates means of each column of training data. Using `transform`

on test data then replaces missing values of test data with means that were calculated from training data.

See also what is the difference between 'transform' and 'fit_transform' in sklearn

– sds – 2018-11-30T17:10:39.170@sds The Answer of above gives the link to this question. – Kaushal28 – 2019-05-02T13:20:27.307

4We apply

`fit`

on the`training dataset`

and use the`transform`

method on`both`

- the training dataset and the test dataset – Prakash Kumar – 2019-06-14T11:35:59.017fit_transform() is equivalant to apply fit() and the transform(). Sometimes the former is faster the later. – Dr Nisha Arora – 2020-09-10T00:36:20.657