I have a few doubts in backpropagation....
1.) My first doubt is that we can write the cost function as all the neural elements involved from all layers and then we can differentiate with respect to each weight... But this takes a lot of time and hence we use chain rule. Is my understanding right?
2.) In order to calculate the derivative of cost function with respect to a particular weight, We just take the derivative of cost function with the corresponding input neuron multiplied by the derivative of input neuron by that particular weight... Is this right?
3.) This is a crucial doubt. Delta term which we use is as I understood nothing but derivative of cost function with rest to the INPUT (Z).. IS MY UNDERSTANDING RIGHT?. In order to calculate the derivative of a cost function with respect to input layer L... We need the all the next layers Delta... AND THATS WHY WE CAN CALCULATE EFFICIENTLY USING BACKPROPAGATION... Right?
4.) When calculating the derivative of cost function with respect to input in previous layers,, WHY DO WE TAKE SUMMATION OF THE NEXT LAYERS... WHY?
5.) MY FINAL DOUBT IS WHAT DOES DELTA SIMPLY MEAN? DOES IT MEAN HOW MUCH THE INPUT NEURON IS AFFECTING THE COST FUNCTION?
THANKS AND I AM A BEGINNER.. IS MY FUNDAMENTALS CLEAR OR AM I CONFUSED?