Why does a neuron in a multi-layer network need several input connections?


For example, if I have the following architecture:

3 layer neural network

  • Each neuron in the hidden layer has a connection from each one in the input layer.
  • 3 x 1 Input Matrix and a 4 x 3 weight matrix (for the backpropagation we have of course the transformed version 3 x 4)

But until now, I still don't understand what the point is that a neuron has 3 inputs (in the hidden layer of the example). It would work the same way, if I would only adjust one weight of the 3 connections.

But in the current case the information flows only distributed over several "channels", but what is the point?

With backpropagation, in some cases the weights are simply adjusted proportionally based on the error.

Or is it just done that way, because then you can better mathematically implement everything (with matrix multiplication and so on)?

Either my question is stupid or I have an error in my thinking and assume wrong ideas. Can someone please help me with the interpretation.

In tensorflow playground for example, I cut the connections (by setting the weight to 0), it just compansated it by changing the other still existing connection a bit more: TensorflowImage


Posted 2020-07-29T13:29:55.170

Reputation: 11



There's a few reasons I can think of, though I have not read an explicit description of why it is done this way. It's likely that people just started doing it this way because it's most logical, and people who have attempted to try your method of having reduced connections have seen a performance hit and so no change was made.

The first reason is that if you allow all nodes from one layer to connect to all others in the next, the network will optimise unnecessary connections out. Essentially, the weighting of these connections will become 0. This, however, does not mean you can trim these connections, as ignoring them in this local minima might be optimal, but later it might be really important these connections remain. As such, you can never truly know if a connection between one layer and the next is necessary, so it's just better to leave it in case it helps improve network performance.

The second reason is it's just simpler mathematically. Networks are implemented specifically so it's very easy to apply a series of matrix calculations to perform all computations. Trimming connections means either:

  • A matrix must contain 0 values, wasting computation time
  • A custom script must be written to calculate this networks structure, which in the real world can take a very long time as it must be implemented using something like CUDA (on a GPU level, making it very complicated)

Overall, it's just a lot simpler to have all nodes connected between layers, rather than on connection per node.


Posted 2020-07-29T13:29:55.170

Reputation: 800


It doesn't.

Whether or not this is useful is another story, but it is totally fine to do that neural net you have with just one input value. Perhaps you choose one pixel of the photo and make your classification based on the intensity in that one pixel (I guess I'm assuming a black-and-white photo), or you have some method to condense an entire photograph into one value that summarizes the photo. Then each neuron in the hidden layer only has one input connection.

Likewise, you are allowed to decide that the top neuron in the hidden layer should have only one input connection; just drop the other two.

Again, this might not give useful results, but they're still neural networks.


Posted 2020-07-29T13:29:55.170

Reputation: 131

so in short, was it made so that it could be better represented mathematically? So as a unified model? Isn't there some document or something that takes up my question and maybe even explains a meaning? – iwab – 2020-07-29T16:19:18.293

I don't follow how you get from my answer to "better represented mathematically" and "a unified model". Please explain. – Dave – 2020-07-29T16:22:49.003

ahh, i read your answer a bit wrong at first, but you actually say that it would also work with one input, but i would still like to know why someone came up with this architecture. – iwab – 2020-07-29T16:26:32.207

Which architecture? – Dave – 2020-07-29T16:27:50.410

the architecture that each neuron in one layer is connected to each in the next. – iwab – 2020-07-29T16:29:05.603

It seems like you're the one who came up with it. – Dave – 2020-07-29T16:29:57.423

Let us continue this discussion in chat.

– Dave – 2020-07-29T16:31:25.830


If you adopt a slightly different point-of-view, then a neural network of this static kind is just a big function with parameters, $y=F(x,P)$, and the task of training the network is a non-linear fit of this function to the data set.

That is, training the network is to reduce all of the residuals $y_k-F(x_k,P)$ simultaneously. This is a balancing act, just tuning one weight to adjust one residual will in general worsen some other residuals. Even if that is taken into account, methods that adjust one variable at a time are usually much slower than methods that adjust all variables simultaneously along some gradient or Newton direction.

The usual back-propagation algorithm sequentializes the gradient descent method for the square sum of the residuals. Better variants improve that to a Newton-like method by some estimate of the Hessean of this square sum or following along the idea of the Gauß-Newton method.

Lutz Lehmann

Posted 2020-07-29T13:29:55.170

Reputation: 101

But I donts see, why just tuning one weight worsen some other residuals? If I have a layer with 3 neurons, I just tune 3 weights (not 9) and ending with the same results. Why should it take more time? It would just look like that here: https://i.imgur.com/HvhaBrV.jpg

– iwab – 2020-07-30T05:03:45.110