Normally we initialize biases with 0s, but the paper says:
We initialized the neuron biases in the second, fourth, and fifth convolutional layers, as well as in the fully-connected hidden layers, with the constant 1.
So I went ahead and initialized the biases to 1 as the paper says. But that didn't make the network learn at all. Basically the last fully connected layer was producing a lot of 0s, which is otherwise known as dying-relu-problem. Out of 4096 neurons only 40 or 50 were producing non-zeros.
After lot of debugging, I came to realize that: if I make the fully connected layers' bias to 0 than they are nicely learning. loss decreased nicely.
Now I'm wondering:
- How bias plays the role for dying-relu-problem here ?
- Can all dying-relu-problem be corrected using bias searching ?