Use sparse categorical crossentropy when your classes are mutually exclusive (e.g. when each sample belongs exactly to one class) and categorical crossentropy when one sample can have multiple classes or labels are soft probabilities (like [0.5, 0.3, 0.2]).

Formula for categorical crossentropy (S - samples, C - classess, $s \in c $ - sample belongs to class c) is:

$$ -\frac{1}{N} \sum_{s\in S} \sum_{c \in C} 1_{s\in c} log {p(s \in c)} $$

For case when classes are exclusive, you don't need to sum over them - for each sample only non-zero value is just $-log p(s \in c)$ for true class c.

This allows to conserve time and memory. Consider case of 10000 classes when they are mutually exclusive - just 1 log instead of summing up 10000 for each sample, just one integer instead of 10000 floats.

Formula is the same in both cases, so no impact on accuracy should be there.

1Do they impact the accuracy differently, for example on mnist digits dataset? – Master M – 2018-12-01T08:47:25.047

1Mathematically there is no difference. If there is significant difference in values computed by implementations (say tensorflow or pytorch), then this sounds like a bug. Simple comparison on random data (1000 classes, 10 000 samples) show no difference. – frenzykryger – 2018-12-01T14:20:11.337

Dear frenzykryger, I guess you forgot a minus for the one sample case only: "for each sample only non-zero value is just -log(p(s $\in$ c))". For the rest, nice answer. – Nicg – 2019-09-13T12:48:32.723

You're right. Thanks! – frenzykryger – 2019-09-14T13:54:41.227

@frenzykryger I am working on multi-output problem. I have 3 seperate output

`o1,o2,o3`

and each one have`167,11,7`

classes respectively. I've read your answer that it'll make no difference but is there any difference if I use`sparse__`

or not. Can I go for`categorical`

for the last 2 and`sparse`

for the first one as there are 167 classes in the first class? – Deshwal – 2020-01-08T04:58:06.500