You can also understand the logic from the view of constrained optimisation.
Introduce a Lagrange function:
$$
\mathcal{L} = \text{Tr} (w^{T} X X^{T} w) - \lambda w^{T} w
$$
And take the derivative with respect to $w$:
$$
\frac{\partial \mathcal{L}}{\partial w} = 2 (X X^{T} - \lambda) w
$$
For the general case of dimension $\geqslant 1$ $w$ is a set of vectors $w = (w_1 w_2 \ldots w_n)$.
This expression vanishes, if for some index $i$ $w_i$ is an of eigenvector of $XX^{T}$ with the eigenvalue $\lambda_i$, and all other components are set to zero. In other words, stationary points are the eigenvectors of $X X^{T}$.

The condititon $w^T w = 1$ imposes the orthogonality condition on the eigenvectors. In fact, going back to the initial functional, one sees, that $w_i X X^{T} w_j = \lambda_j w_i^{T} w_j = 0$ for $i \neq j$. Therefore, we have finally:
$$
\mathcal{L} =\sum \lambda_i - \lambda
$$
Which is maximized for any $k \geq 1$, by taking $k$ largest eigenvalues.

To me, this question "How to solve the math in the case we reduce the number of dimension is greater than 1?" is unclear. Can you clarify that? – nbro – 2020-09-01T11:16:46.693

Related: https://math.stackexchange.com/q/3736092

– Rodrigo de Azevedo – 2020-09-01T22:51:33.927