Bias of 1 in fully connected layers introduced dying relu problem

1

While implementing AlexNet (model-code), one of the thing I need to do was to initialize the biases of the convolutional layers and fully connected layers.

Normally we initialize biases with 0s, but the paper says:

We initialized the neuron biases in the second, fourth, and fifth convolutional layers, as well as in the fully-connected hidden layers, with the constant 1.

So I went ahead and initialized the biases to 1 as the paper says. But that didn't make the network learn at all. Basically the last fully connected layer was producing a lot of 0s, which is otherwise known as dying-relu-problem. Out of 4096 neurons only 40 or 50 were producing non-zeros.

After lot of debugging, I came to realize that: if I make the fully connected layers' bias to 0 than they are nicely learning. loss decreased nicely.

Now I'm wondering:

  • How bias plays the role for dying-relu-problem here ?
  • Can all dying-relu-problem be corrected using bias searching ?

Abhisek

Posted 2018-08-22T17:15:47.270

Reputation: 111

No answers