1

I am training a neural network with 1 sigmoid hidden layer and a linear output layer. The network simply approximates a cosine function. The weights are initiliazed according to Nguyen-Widrow initialization and the biases are initialized to 1. I am using MATLAB as a platform.

Running the network a number of times without changing any parameters, I am getting results (mean squared error) which range from 0.5 to 0.5*10^-6. I cannot understand how the results can even vary that much, I'd imagine there would at least be a narrower and more consistent window of errors.

What could be causing such a large variance?

I agree. I primarily tested with the weights initialized to 1 (I have my reasons) and it's how I first noticed the large variance. Then I initialized the weights using NW to check if the results would be better. There was some improvement but the large variance was still present. – edgaralienfoe – 2015-03-12T13:28:36.473

If you repeat training with the same set of initial weights, I would expect the same result (unless you have some sort of asynchronous processing going on). One other point: the difference in MSE between 0.5 and 0.5*10^-6 is only about 0.5, which isn't necessarily a large difference, depending on your training set size, number of outputs, and initial MSE. – bogatron – 2015-03-12T15:00:01.067

The variance was still being observed with weights and biases all initialized to 1, which is why I felt confused when this happened. The data set is also consistent and it contains 10,000 values. There is less variance now even with such initialization, however everynow and then it tends to happen and it seems very strange. Is it possible that the error surface being created is different everytime and hence the network sometimes gets stuck in local minimum by chance? – edgaralienfoe – 2015-03-12T15:44:52.797

What makes you say that 0.5 vs. 0.5*10-6 is a large variance? What is the MSE at the start of training? Also, I'm suspicious about

allweights being initialized to 1. If all the weights in an MLP are initialized to the same value, then I would expect all final weights for a given layer to converge to a common value. – bogatron – 2015-03-12T15:53:26.390I have one output since it's approximating a one dimensional cosine function, 10,000 values for training set size. The initial MSE (after the first epoch using Levenberg Marquardt BP) is 14.6268. The neural network has one hidden layer with 4 neurons and it is approximating cos(Pi). All the weights and biases are initialized to 1. It might be worth noting that the data division is happening at random (I'm calling dividerand in MATLAB) every time. Again, the dataset is consistent, and each point is evenly spaced out (that is the cosine function is being plotted with a fixed step) – edgaralienfoe – 2015-03-12T19:12:01.543