4

1

There is one behavior of labelbinarizer

```
import numpy as np
from sklearn import preprocessing
lb = preprocessing.LabelBinarizer()
lb.fit(np.array([[0, 1, 1], [1, 0, 0]]))
lb.classes_
```

The output is `array([0, 1, 2])`

. Why there is a 2 there?

4

1

There is one behavior of labelbinarizer

```
import numpy as np
from sklearn import preprocessing
lb = preprocessing.LabelBinarizer()
lb.fit(np.array([[0, 1, 1], [1, 0, 0]]))
lb.classes_
```

The output is `array([0, 1, 2])`

. Why there is a 2 there?

3

I think the documentation is kind of self-explanatory here. Fit takes in array of size `n_samples`

in which each element is the class of the datum or if the data point belongs to multiple classes, the input would be obviously of size `n_samples x n_classes`

. That is what you gave in as input in your example. Each point can belong to any of the three classes. That is why you have `[0, 1, 2]`

as number of classes. So as mentioned in the documentation if you try

```
>> lb.transform([0, 1, 2, 0])
[[1 0 0]
[0 1 0]
[0 0 1]
[1 0 0]]
```

and if you try a class that is non-existent after fit like

```
>> lb.transform([0, 1, 2, 1000])
[[1 0 0]
[0 1 0]
[0 0 1]
[0 0 0]]
```

No class named `1000`

exists, so multi-targeted conversion for `1000`

class case is plainly `[0, 0, 0]`

. Hope this helps.

1

Because in `lb.fit`

you feed in a 2-by-3 array, which means 2 samples and each sample could have at most 3 classes. Therefore, you got `0, 1, 2`

here. See:

```
class0 class1 class2
sample1 0 1 1
sample2 1 0 0
```

However, I think `LabelBinarizer`

encoder has one character very unlike other encoders. Note that usually we put the raw form of lables into `encoder.fit()`

function; for example:

```
>>> from sklearn import preprocessing
>>> le = preprocessing.LabelEncoder()
>>> le.fit([1, 2, 2, 6])
LabelEncoder()
>>> le.classes_
array([1, 2, 6])
```

and we expect `encoder.transform()`

yield the required format for new raw labels, i.e.,

```
>>> le.transform([1, 1, 2, 6])
array([0, 0, 1, 2]...)
```

But for `LabelBinarizer`

, I think what we put into `lb.fit()`

is actually the required coding format, and the true raw label codes should be like `[[1,2], 0]`

, which seems not to be format can be handled by `sklearn`

since the dimension varies. Here is the paradox, in the python document, we see such an example:

```
>>> lb.transform([0, 1, 2, 1])
array([[1, 0, 0],
[0, 1, 0],
[0, 0, 1],
[0, 1, 0]])
```

all samples in `[0,1,2,1]`

are uniquely labeled, and if you tried to use `lb.transform([[1,2],2])`

to indicate that the first sample is multiple-labeled, you get error. That is, your raw labels have to in the exactly same format as after being transformed by `lb`

.

You mean you have three classes for the first code snippet? Actually I guess the documentation was so brief, I didn't understand when I read. – Media – 2018-01-28T18:38:34.963

1@Media The first code snippet is the continuation of what op has mentioned from the sklearn documentation. But either ways it has three classes. – Kiritee Gak – 2018-01-28T18:41:46.977