Can a neural network compute $y = x^2$?



In spirit of the famous Tensorflow Fizz Buzz joke and XOr problem I started to think, if it's possible to design a neural network that implements $y = x^2$ function?

Given some representation of a number (e.g. as a vector in binary form, so that number 5 is represented as [1,0,1,0,0,0,0,...]), the neural network should learn to return its square - 25 in this case.

If I could implement $y=x^2$, I could probably implement $y=x^3$ and generally any polynomial of x, and then with Taylor series I could approximate $y=\sin(x)$, which would solve the Fizz Buzz problem - a neural network that can find remainder of the division.

Clearly, just the linear part of NNs won't be able to perform this task, so if we could do the multiplication, it would be happening thanks to activation function.

Can you suggest any ideas or reading on subject?

Boris Burkov

Posted 2019-03-22T13:02:40.397

Reputation: 255



Neural networks are also called as the universal function approximation which is based in the universal function approximation theorem. It states that:

In the mathematical theory of artificial neural networks, the universal approximation theorem states that a feed-forward network with a single hidden layer containing a finite number of neurons can approximate continuous functions on compact subsets of Rn, under mild assumptions on the activation function

Meaning a ANN with a non linear activation function could map the function which relates the input with the output. The function $y = x^2$ could be easily approximated using regression ANN.

You can find an excellent lesson here with a notebook example.

Also, because of such ability ANN could map complex relationships for example between an image and its labels.

Shubham Panchal

Posted 2019-03-22T13:02:40.397

Reputation: 1 792

2Thank you very much, this is exactly what I was asking for! – Boris Burkov – 2019-03-22T13:23:28.587

4Although true, it a very bad idea to learn that. I fail to see where any generalization power would arise from. NN shine when there's something to generalize. Like CNN for vision that capture patterns, or RNN that can capture trends. – Jeffrey – 2019-03-22T15:21:12.363


I think the answer of @ShubhamPanchal is a little bit misleading. Yes, it is true that by Cybenko's universal approximation theorem we can approximate $f(x)=x^2$ with a single hidden layer containing a finite number of neurons can approximate continuous functions on compact subsets of $\mathbb{R}^n$, under mild assumptions on the activation function.

But the main problem is that the theorem has a very important limitation. The function needs to be defined on a compact subsets of $\mathbb{R}^n$ (compact subset = bounded + closed subset). But why is this problematic? When training the function approximator you will always have a finite data set. Hence, you will approximate the function inside a compact subset of $\mathbb{R}^n$. But we can always find a point $x$ for which the approximation will probably fail. That being said. If you only want to approximate $f(x)=x^2$ on a compact subset of $\mathbb{R}$ then we can answer your question with yes. But if you want to approximate $f(x)=x^2$ for all $x\in \mathbb{R}$ then the answer is no (I exclude the trivial case in which you use a quadratic activation function).

Side remark on Taylor approximation: You always have to keep in mind that a Taylor approximation is only a local approximation. If you only want to approximate a function in a predefined region then you should be able to use Taylor series. But approximating $\sin(x)$ by the Taylor series evaluated at $x=0$ will give you horrible results for $x\to 10000$ if you don't use enough terms in your Taylor expansion.


Posted 2019-03-22T13:02:40.397

Reputation: 1 254

3Nice catch! "compact set". – Esmailian – 2019-03-22T17:14:40.497

2Many thanks, mate! Eye-opener! – Boris Burkov – 2019-03-22T17:23:54.770

In $\mathbb{R}^n$, compact subsets are exactly the sets that are closed and bounded (such as $[0,1]\subset \mathbb{R}$). This is called the Heine-Borel theorem. – Dave – 2020-05-25T12:54:30.180


Yes, theoretically speaking they can approximate any function.

Other answers have been very detailed and thorough. However, let me add one interesting aspect: when you try to approximate very simple functions, do not despair if you find that most of models you'll train will fail. That is because simple functions require simple models, and simple models (i.e. models with a very low number of parameters) are extremely non-robust to random initialization of weights. Therefore it's perfectly possible that 9 out of 10 networks you train will fail, initially, at comparatively simple tasks.

Fortunately y = x**2 is a convex optimization problem, which means that, for gradient descent, initialization does not matter.


Posted 2019-03-22T13:02:40.397

Reputation: 4 928

1Please consider $y=x^2$ is what you want to find. The cost function should be convex. If you make the convex problem, you'll find out it is convex, but the function is approximated using a limited range of numbers. It will have poor outcomes for data out of training range which is far from train set. – Media – 2020-05-25T11:32:14.767

Yeah... I wrote just above that y = x**2 is a convex problem... it's right there in my comment... – Leevo – 2020-05-25T12:34:05.187