What is the proof that any given unitary matrix can be converted as above?

Let $U$ be an arbitrary $2\times 2$ unitary matrix. This is equivalent to the rows/columns of $U$ forming an orthonormal system.

Let us write a generic $U$ as
$$U=\begin{pmatrix}a&b\\c&d\end{pmatrix}.$$
The constraints imposed on the coefficients $a,b,c,d$ by the requirement of $U$ being unitary are
$$|a|^2+|c|^2=1,\qquad |b|^2+|d|^2=1,\qquad a^* b+c^* d=0.$$
A pair of complex numbers $a,b\in\mathbb C$ satisfying $|a|^2+|b|^2=1$ can always be parametrized as
$$a=e^{i\alpha_{11}}\cos\theta,\qquad b=e^{i\alpha_{12}}\sin\theta,$$
for some real coefficients $\alpha_{ij},\theta\in\mathbb R$.

It follows that using only the normalization constraint (but without taking into account the orthogonality) we can parametrize $U$ as

$$U=\begin{pmatrix}e^{\alpha_{11}}\cos\theta& e^{\alpha_{12}}\sin\theta\\
e^{\alpha_{21}}\sin\theta & e^{\alpha_{22}}\cos\theta
\end{pmatrix}.$$

Requiring the columns to be orthogonal then gives the additional relation

$$e^{i(\alpha_{11}-\alpha_{12})}+e^{i(\alpha_{21}-\alpha_{22})}=0,$$
that is, $\alpha_{11}=\alpha_{12}+\alpha_{21}-\alpha_{22}+\pi$.

We conclude that $U$ is parametrized by *three* real parameters, here denoted $\theta,\alpha_{12},\alpha_{21},\alpha_{22}$.

To get the form you show you simply need to change variables as follows:

\begin{align}
\theta&=\gamma/2, \\
\alpha_{12} &= \alpha-\beta/2+\delta/2+\pi,\\
\alpha_{21} &= \alpha+\beta/2-\delta/2, \\
\alpha_{22} &= \alpha+\beta/2+\delta/2.
\end{align}

Like why there should be only 4 variables and not more or less?

While the above already proved this, it can be useful to know that this a special case of a more general result.

A generic unitary $n\times n$ matrix is specified by $n^2$ real parameters (see wiki page on the unitary group for more details).

An easy way to see this is again to remember that unitary matrices are characterised by their columns/rows forming an orthonormal system.
This amounts to $n$ real constraints (imposing each of the $n$ columns to be normalized), plus $\binom{n}{2}=n(n-1)/2$ additional *complex* constraints (imposing each pair of columns to be orthogonal). Each complex constraints amounts to two real constraints, so this sums up to a total of
$$n+2\binom{n}{2}=n+n(n-1)=n^2$$
real, *independent* constraints.

A generic $n\times n$ matrix is characterised by $n^2$ complex numbers, that is, $2n^2$ real numbers.

We conclude that the number of free parameters of a generic $n\times n$ unitary matrix is:
$$2n^2-n^2=n^2.$$

Going back to the simple $2\times 2$ case above, you can see how we get back the previous result because $2^2=4$ (you might also notice that to count parameters in the special case $2\times 2$ I used a different *ad-hoc* strategy, rather then the one showed here to get the count in the general case).

## Yet another way to count parameters

Another method I like is to think entirely in terms of orthonormal systems.
The question is: how many parameters need to be given to specify an orthonormal basis in an $n$-dimensional complex vector space?

Let us start by the first vector. The only constraint here is that we want the vector to be normalized.
The number of real parameters needed to specify a normalized vector in $n$-dimensions is $d_1=2n-1$.

Let us now add another vector. Now we have to impose both the normalization of this additional vector (one real constraint), and the orthogonality of this additional vector to the initial one (two real constraints).
The additional parameters are therefore $d_2= 2n-3$.

A third vector will need to be normalized (one real constraint), and orthogonal to the first two vectors ($2\times 2$ real constraints), thus $d_3=2n-5$.

Iterate this reasoning until you get to the last vector, which will be specified by a single real parameter.

The total number of parameters is therefore:
$$\sum_k d_k=(2n-1)+(2n-3)+\cdots+3+1=\sum_{k=0}^{n-1} (2k+1).$$
In other words, the number of parameters is given by the sum of the first $n$ odd integers, which is again readily shown to equal $n^2$.

Awesome. Thanks for the proof. Also, it's OK to use those simultaneous equations because they are linearly independent so we get a 3 variable solution, right? – Tech Solver – 2019-01-14T10:06:54.870

@user2508039 yes. You can also just check that the system can be inverted and therefore you have a bijection between the two sets of parameters – glS – 2019-01-14T10:23:02.347