I knew that Residual Network (ResNet) made He normal initialization popular. In ResNet, He normal initialization is used
, while the first layer uses He uniform initialization.
I've looked through ResNet paper and "Delving Deep into Rectifiers" paper (He initialization paper), but I haven't found any mention on normal init vs uniform init.
Batch Normalization allows us to use much higher learning rates and be less careful about initialization.
In Batch Normalization paper's abstract, it is said that Batch Normalization allows us to be less careful about initialization.
ResNet itself is still care on when to use normal init vs uniform init (rather than just go with the uniform init).
- When to use (He or Glorot) normal-distributed initialization over uniform initialization?
- What are normal-distributed initialization effects with Batch Normalization?
- It rhymes to use normal init with Batch Normalization, but I haven't found any paper to back this fact.
- I knew that ResNet uses He init over Glorot init because He init does better on a deep network.
- I've understood about Glorot init vs He init.
- My question is about Normal vs Uniform init.