32

29

I am using TensorFlow for experiments mainly with neural networks. Although I have done quite some experiments (XOR-Problem, MNIST, some Regression stuff, ...) now, I struggle with choosing the "correct" cost function for specific problems because overall I could be considered a beginner.

Before coming to TensorFlow I coded some fully-connected MLPs and some recurrent networks on my own with Python and NumPy but mostly I had problems where a simple squared error and a simple gradient descient was sufficient.

However, since TensorFlow offers quite a lot of cost functions itself as well as building custom cost functions, I would like to know if there is some kind of tutorial maybe specifically for cost functions on neural networks? (I've already done like half of the official TensorFlow tutorials but they're not really explaining **why** specific cost functions or learners are used for specific problems - at least not for beginners)

**To give some examples:**

```
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y_output, y_train))
```

I guess it applies the softmax function on both inputs so that the sum of one vector equals 1. But what exactly is cross entropy with logits? I thought it sums up the values and calculates the cross entropy...so some metric measurement?! Wouldn't this be very much the same if I normalize the output, sum it up and take the squared error?
Additionally, why is this used e.g. for MNIST (or even much harder problems)? When I want to classify like 10 or maybe even 1000 classes, doesn't summing up the values completely destroy any information about *which* class actually was the output?

`cost = tf.nn.l2_loss(vector)`

What is this for? I thought l2 loss is pretty much the squared error but TensorFlow's API tells that it's input is just one tensor. Doesn't get the idea at all?!

Besides I saw this for **cross entropy** pretty often:

```
cross_entropy = -tf.reduce_sum(y_train * tf.log(y_output))
```

...but why is this used? Isn't the loss in cross entropy mathematically this:

```
-1/n * sum(y_train * log(y_output) + (1 - y_train) * log(1 - y_output))
```

Where is the `(1 - y_train) * log(1 - y_output)`

part in most TensorFlow examples? Isn't it missing?

**Answers:** I know this question is quite open, but I do not expect to get like 10 pages with every single problem/cost function listed in detail. I just need a short summary about when to use which cost function (in general or in TensorFlow, doesn't matter much to me) and some explanation about this topic. And/or some source(s) for beginners ;)

1Good question. Welcome to the site :) – Dawny33 – 2016-01-19T11:49:57.497

2Usually, MSE is taken for regression and Cross-Entropy for classification. Classification Figure of Merit (CFM) was introduced in "A novel objective function for improved phoneme recognition using time delay neural networks" by Hampshire and Waibel. If I remember it correctly, they also explain why they designed CFM like they did. – Martin Thoma – 2016-01-20T13:26:34.447

1I think reduce_sum(y_train*tf.log(y_output)) is used a lot because its a fairly common "simple case" example. It'll run sum each batch's error, which means your error's double the cost (and the magnitude of the gradient) if your batch_sizes double. Making the simple change to reduce_mean will at the very least make debugging and playing with settings more understandable in my opinion. – neuron – 2016-01-21T13:24:45.390