## Extra output layer in a neural network (Decimal to binary)

23

10

I'm working through a question from the online book.

I can understand that if the additional output layer is of 5 output neurons, I could probably set bias at 0.5 and weight of 0.5 each for the previous layer. But the question now ask for a new layer of four output neurons - which is more than enough to represent 10 possible outputs at $$2^{4}$$.

Can someone walk me through the steps involved in understanding and solving this problem?

The exercise question:

There is a way of determining the bitwise representation of a digit by adding an extra layer to the three-layer network above. The extra layer converts the output from the previous layer into a binary representation, as illustrated in the figure below. Find a set of weights and biases for the new output layer. Assume that the first 3 layers of neurons are such that the correct output in the third layer (i.e., the old output layer) has activation at least 0.99, and incorrect outputs have activation less than 0.01.

18

The question is asking you to make the following mapping between old representation and new representation:

Represent    Old                     New
0            1 0 0 0 0 0 0 0 0 0     0 0 0 0
1            0 1 0 0 0 0 0 0 0 0     0 0 0 1
2            0 0 1 0 0 0 0 0 0 0     0 0 1 0

3            0 0 0 1 0 0 0 0 0 0     0 0 1 1
4            0 0 0 0 1 0 0 0 0 0     0 1 0 0
5            0 0 0 0 0 1 0 0 0 0     0 1 0 1

6            0 0 0 0 0 0 1 0 0 0     0 1 1 0
7            0 0 0 0 0 0 0 1 0 0     0 1 1 1
8            0 0 0 0 0 0 0 0 1 0     1 0 0 0

9            0 0 0 0 0 0 0 0 0 1     1 0 0 1


Because the old output layer has a simple form, this is quite easy to achieve. Each output neuron should have a positive weight between itself and output neurons which should be on to represent it, and a negative weight between itself and output neurons that should be off. The values should combine to be large enough to cleanly switch on or off, so I would use largish weights, such as +10 and -10.

If you have sigmoid activations here, the bias is not that relevant. You just want to simply saturate each neuron towards on or off. The question has allowed you to assume very clear signals in the old output layer.

So taking example of representing a 3 and using zero-indexing for the neurons in the order I am showing them (these options are not set in the question), I might have weights going from activation of old output $i=3$, $A_3^{Old}$ to logit of new outputs $Z_j^{New}$, where $Z_j^{New} = \Sigma_{i=0}^{i=9} W_{ij} * A_i^{Old}$ as follows:

$$W_{3,0} = -10$$ $$W_{3,1} = -10$$ $$W_{3,2} = +10$$ $$W_{3,3} = +10$$

This should clearly produce close to 0 0 1 1 output when only the old output layer's neuron representing a "3" is active. In the question, you can assume 0.99 activation of one neuron and <0.01 for competing ones in the old layer. So, if you use the same magnitude of weights throughout, then relatively small values coming from +-0.1 (0.01 * 10) from the other old layer activation values will not seriously affect the +-9.9 value, and the outputs in the new layer will be saturated at very close to either 0 or 1.

@NeilSlater, I can see how (1 * 10) * 1 = +10 which is given in your explanation, however I cannot understand how you get to -10 instead of 0. I would have thought that: (0 * 10) * 1 = 0 would be correct. Please excuse the basic nature of this question as I have just begun going through the book mentioned. – rrz0 – 2018-02-09T16:04:03.447

1@Rrz0: Because I am assuming a sigmoid layer on output, as it is a binary classification - the bit is either on or off. So in your example you get sigmoid((0 * 10) * 1) which is 0.5. By choosing suitably large numbers, you ensure either a very high or low output before the sigmoid, which will then output very close to 0 or 1. This is more robust IMO than the linear output assumed in FullStack's answer, but ignoring that, essentially our two answers are the same. – Neil Slater – 2018-02-09T19:13:06.033

@NeilSlater, thanks for getting back, and for the explanation. I think that I have understood. So, when for example 3 is activated, assuming +-10 weights you get: (+10*1) or (-10*1) depending whether you want to output a 0 or 1 after the sigmoid(). In this case we would do: (-10*1), (-10*1), (+10*1), (+10*1) which represents 0011. Is the reasoning here correct or am I missing something? Thanks again – rrz0 – 2018-02-09T20:03:43.423

@Rrz0: That's correct. – Neil Slater – 2018-02-09T20:33:17.630

@NeilSlater would this solution by Sebastian Bensusan also be acceptable: https://s33.postimg.cc/p6r2f6uof/Screen_Shot_2018-06-06_at_10.25.39_PM.png

– Pablo – 2018-06-07T02:26:34.467

@Pablo: Yes that looks almost identical to FullStack's answer here, with the bit order reversed. – Neil Slater – 2018-06-07T06:50:33.333

Thank you. I couldn't quite follow this part, would you mind elaborating further please? -

"I might have weights going from activation of old output i=3, AOld3 to logit of new outputs ZNewj, where ZNewj=Σi=9i=0Wij∗AOldi as follows:

W3,0=−10 W3,1=−10 W3,2=+10 W3,3=+10" – Victor Yip – 2015-08-02T12:10:38.450

1@VictorYip: The equation is just the normal feed-forward network equation, but to use it I had to define my terms carefully (since you have no reference maths in your question). The "logit" Z value is the value calculated at the neuron before activation functions have been applied (and generally $A_i = f( Z_i )$ where $f$ is e.g. sigmoid function). The example weights are the values I would use for connecting new output layer neurons to old ones, but just the ones that connect the 4 neurons in the new output layer to one of the neurons in old output layer (the one for output "3") – Neil Slater – 2015-08-02T15:20:18.047

So if you want to see this working in Octave / Matlab, you can do something like this: – learningMachine – 2020-03-06T06:26:17.020

@NeilSlater - will your example weights work for the outputs that are not 3? I don't see that they will. Please elaborate. Thanks. – FullStack – 2015-10-23T06:59:18.913

@FullStack: Yes it will work, because if $A_3^{old}$ is not active (activation 0), then none of the weights in the example have any impact. You have to construct similar maps for connections from each output neuron in the old layer - each one is associated very simply with its binary representation in the new layer, and they are all independent. – Neil Slater – 2015-10-23T08:31:29.337

4

The code below from SaturnAPI answers this question. See and run the code at https://saturnapi.com/artitw/neural-network-decimal-digits-to-binary-bitwise-conversion

% Welcome to Saturn's MATLAB-Octave API.
% Delete the sample code below these comments and write your own!

% Exercise from http://neuralnetworksanddeeplearning.com/chap1.html
% There is a way of determining the bitwise representation of a digit by adding an extra layer to the three-layer network above. The extra layer converts the output from the previous layer into a binary representation, as illustrated in the figure below. Find a set of weights and biases for the new output layer. Assume that the first 3 layers of neurons are such that the correct output in the third layer (i.e., the old output layer) has activation at least 0.99, and incorrect outputs have activation less than 0.01.

% Inputs from 3rd layer
xj = eye(10,10)

% Weights matrix
wj = [0 0 0 0 0 0 0 0 1 1 ;
0 0 0 0 1 1 1 1 0 0 ;
0 0 1 1 0 0 1 1 0 0 ;
0 1 0 1 0 1 0 1 0 1 ]';

% Results
wj*xj

% Confirm results
integers = 0:9;
dec2bin(integers)


What is meant by the inputs eye(10,10) ? – rrz0 – 2018-02-09T15:53:42.273

yes, it indeed works like a charm, just tried it in Octave Online and confirmed, thanks!!... PS: A little bit o explanation would also be good, should someone be stuck :) – Anaximandro Andrade – 2018-03-14T01:12:04.967

1@Rrz0 it's a Matlab/Octave function for creating identity matrix (with just ones in the main diagonal) – Anaximandro Andrade – 2018-03-14T01:13:01.987

Note that this implements a set of weights for a linear output layer. In contrast, my answer assumes sigmoid activation in the output layer. Otherwise the two answers are equivalent. – Neil Slater – 2015-10-23T11:21:57.820

0

Pythonic proof for the above exercise:

"""
NEURAL NETWORKS AND DEEP LEARNING by Michael Nielsen

Chapter 1

http://neuralnetworksanddeeplearning.com/chap1.html#exercise_513527

Exercise:

There is a way of determining the bitwise representation of a digit by adding an extra layer to the three-layer network above. The extra layer converts the output from the previous layer into a binary representation, as illustrated in the figure below. Find a set of weights and biases for the new output layer. Assume that the first 3 layers of neurons are such that the correct output in the third layer (i.e., the old output layer) has activation at least 0.99, and incorrect outputs have activation less than 0.01.

"""
import numpy as np

def sigmoid(x):
return(1/(1+np.exp(-x)))

def new_representation(activation_vector):
a_0 = np.sum(w_0 * activation_vector)
a_1 = np.sum(w_1 * activation_vector)
a_2 = np.sum(w_2 * activation_vector)
a_3 = np.sum(w_3 * activation_vector)

return a_3, a_2, a_1, a_0

def new_repr_binary_vec(new_representation_vec):
sigmoid_op = np.apply_along_axis(sigmoid, 0, new_representation_vec)
return (sigmoid_op > 0.5).astype(int)

w_0 = np.full(10, -1, dtype=np.int8)
w_0[[1, 3, 5, 7, 9]] = 1
w_1 = np.full(10, -1, dtype=np.int8)
w_1[[2, 3, 6, 7]] = 1
w_2 = np.full(10, -1, dtype=np.int8)
w_2[[4, 5, 6, 7]] = 1
w_3 = np.full(10, -1, dtype=np.int8)
w_3[[8, 9]] = 1

activation_vec = np.full(10, 0.01, dtype=np.float)
# correct number is 5
activation_vec[3] = 0.99

new_representation_vec = new_representation(activation_vec)
print(new_representation_vec)
# (-1.04, 0.96, -1.0, 0.98)
print(new_repr_binary_vec(new_representation_vec))
# [0 1 0 1]

# if you wish to convert binary vector to int
b = new_repr_binary_vec(new_representation_vec)
print(b.dot(2**np.arange(b.size)[::-1]))
# 5


0

A little modification to FullStack's answer regarding Neil Slater's comments using Octave:

% gzanellato
% Octave

% 3rd layer:
A = eye(10,10);

% Weights matrix:

fprintf('\nSet of weights:\n\n')

wij = [-10 -10 -10 -10 -10 -10 -10 -10 10 10;
-10 -10 -10 -10 10 10 10 10 -10 -10;
-10 -10 10 10 -10 -10 10 10 -10 -10;
-10 10 -10 10 -10 10 -10 10 -10 10]

% Any bias between -9.999.. and +9.999.. runs ok

bias=5

Z=wij*A+bias;

% Sigmoid function:

for j=1:10;
for i=1:4;
Sigma(i,j)=int32(1/(1+exp(-Z(i,j))));
end
end

fprintf('\nBitwise representation of digits:\n\n')

disp(Sigma')


0

It follows from all previous ideas: A pure mathematical formula:

$$\boxed{W_{j,k} = 10*(-1)^{(1 - ⌊j/2^k⌋\%2)} }$$

Where $$\boxed{W_{j,k}}$$ is the weight pointing from Decimal $$\boxed{j}$$ to binary digit in the position $$\boxed{k}$$:

Example: