For string data, use `get_dummies()`

(from `Pandas`

). `to_categorical()`

takes integers as inputs.

There are two important differences between
**Keras:** `to_categorical()`

and
**Pandas:** `get_dummies()`

.

**Keras:** `to_categorical()`

`to_categorical()`

takes integers as input (no strings allowed).
`to_categorical()`

generates dummies starting at 0 by default!

Looking at the help function:

```
print(help(to_categorical))
```

Says:

```
to_categorical(y, num_classes=None, dtype='float32')
Converts a class vector (integers) to binary class matrix.
E.g. for use with categorical_crossentropy.
# Arguments
y: class vector to be converted into a matrix
(integers from 0 to num_classes).
num_classes: total number of classes.
dtype: The data type expected by the input, as a string
(`float32`, `float64`, `int32`...)
...
```

So if your data is numeric (int), you can use `to_categorical()`

. You can check if your data is an np.array by looking at `.dtype`

and/or `type()`

.

```
import numpy as np
npa = np.array([2,2,3,3,4,4])
print(npa.dtype, type(npa))
print(npa)
```

Result:

```
int32 <class 'numpy.ndarray'>
[2 2 3 3 4 4]
```

Now you can use `to_categorical()`

:

```
from keras.utils import to_categorical
cat1 = to_categorical(npa)
print(cat1.dtype, type(cat1))
print(cat1)
```

Which yields a matrix:

```
float32 <class 'numpy.ndarray'>
[[0. 0. 1. 0. 0.]
[0. 0. 1. 0. 0.]
[0. 0. 0. 1. 0.]
[0. 0. 0. 1. 0.]
[0. 0. 0. 0. 1.]
[0. 0. 0. 0. 1.]]
```

**Note that the matrix contains five columns** (starting at zero up to four, which is my max. value in the `np.array`

). The first two columns (representing 0 and 1 in the original data) are 0 in the whole matrix, because none of these values are found in the original data.

`to_categorical()`

also takes input which is not explicitly defined as np.array. For instance the statements below would also be legal.

```
alt1 = to_categorical([0,0,1,1,2,2])
print(alt1.dtype, type(alt1))
print(alt1)
alt2 = to_categorical((0,0,1,1,2,2))
print(alt2.dtype, type(alt2))
print(alt2)
```

Because the range of values now is between 0 and 2, the result would look like:

```
[[1. 0. 0.]
[1. 0. 0.]
[0. 1. 0.]
[0. 1. 0.]
[0. 0. 1.]
[0. 0. 1.]]
```

**Pandas:** `get_dummies()`

When you have a `Pandas df`

, you can convert some column to dummies using `get_dummies()`

, regardless of the data type in the column. So it is also possible to convert a column of strings to dummies.

```
import pandas as pd
df = pd.DataFrame(data={'col1':["A", "A", "B", "B", "C", "C"]})
alt3 = pd.get_dummies(df['col1'])
print(type(alt3))
```

This gives:

```
<class 'pandas.core.frame.DataFrame'>
A B C
0 1 0 0
1 1 0 0
2 0 1 0
3 0 1 0
4 0 0 1
5 0 0 1
```

Note that the result is (again) a `Pandas df`

. So we need to convert it to a `np.array`

.

```
alt3 = alt3.to_numpy()
print(alt3.dtype, type(alt3))
print(alt3)
```

This yields:

```
uint8 <class 'numpy.ndarray'>
[[1 0 0]
[1 0 0]
[0 1 0]
[0 1 0]
[0 0 1]
[0 0 1]]
```

So that it is ready to be used with `Keras`

.

Note that the matrix generated here does not (!) start at zero. Instead each distinct value in the chosen `Pandas`

column gets it's own column in the dummy matrix.

1

Look at this: http://pbpython.com/categorical-encoding.html

– moh – 2019-01-07T21:47:02.917I suggest that you modify your question and provide some actual data points so that people can see what kind of variable are there. Clearly you do not know how to encode categorical features. You say you have a mix of categorical and numerical columns, but here "encoded = to_categorical(X)", you pass all your features to be encoded. Not to mention that this way of encoding categorical features is rather wrong as well! Otherwise you get not useful answers. A place to start understanding cat. encoding could be: https://towardsdatascience.com/encoding-categorical-features-21a2651a065c

– TwinPenguins – 2019-01-09T06:57:53.137Check out this answer which shows you how to cast your string to float first, then you can use

`keras_to_categorical()`

and specify float dtypehttps://stackoverflow.com/a/42909410/772521