## Low-dimensional data and quantum machine learning

4

Ewin Tang says to not expect exponential speed-ups from quantum machine learning using low-dimensional data because, in such cases, quantum analogues of classical algorithms will not provide outperformance.

What then is the definition of low-dimensional data exactly in classical machine learning for an array with $$N$$ rows and $$K$$ columns? Can such an array even be considered high-dimensional, and if so, in which direction ($$N\gg K$$ or $$K\gg N$$)? Or does high-dimensionality require that the array possess a third dimension ($$N$$, $$K$$ and $$M$$)?

3

## The Context

The algorithm Ewin Tang originally examined and dequantized was the quantum recommendation system algorithm by Kerenidis & Prakash.

Many QML algorithms, including the quantum recommendation system algorithm, exploit the quantum linear systems algorithm (QLSA), which was posted on arXiv in 2008 by Harrow, Hassidim, and Lloyd (that's why it's also sometimes known as the HHL algorithm). The QLSA algorithm solves a quantum linear system of equations (i.e. the input is quantum information) via matrix inversion. Formally, given a matrix $$\mathbf{A} \in \mathbb{R}^{N \times N}$$ and a vector $$\vec{b} \in \mathbb{R}^N$$, the task is to solve for $$\vec{x} \in \mathbb{R}^N$$ in the system $$\mathbf{A} \vec{x}=\vec{b}$$. Note that the original QLSA didn't include a data loading or measurement procedure (important when considering its complexity).

The QLSA outputs a solution with complexity $$O(d^4\kappa^2\log N/\epsilon)$$, where $$d$$ is the density of $$\mathbf{A}$$, $$\kappa$$ is the magnitude of the ratio of the largest to smallest eigenvalues of the matrix (i.e. $$\kappa = \vert\lambda_{\max}/\lambda_{\min}\vert$$), and $$\epsilon$$ is the precision.

Focusing on $$d$$ and $$\kappa$$, we can define them in layman's terms as:

• $$d$$ is the maximum number of non-zero entries among the rows of $$\mathbf{A}$$
• $$\kappa$$ is the magnitude of the ratio between the largest and smallest eigenvalues of the matrix $$\mathbf{A}$$

Over the years, several improvements have been made to the QLSA – so this complexity can be considered higher than what is theoretically possible.

## And on Dimensionality...

First off, we typically talk about the dimensionality of a data matrix in terms of its columns – that is, an $$m\times n$$ matrix would be high dimensional $$n \gg m$$, where $$m$$ is the number of rows (samples) and $$n$$ is the number of columns (features). You could think of this in the context of the QLSA where $$\mathbf{A}$$ is the data matrix for a linear regression.

I'd argue that Tang is making a general statement about algorithmic complexity. Specifically, that you need a large input (matrix) to get a meaningful speedup up, just given the complexity of the algorithm and a comparable classical approach like Gaussian elimination (which is $$O(N^3)$$) or one of Tang's dequantized (classical) algorithms, which (as I understand them) end up having polynomial complexity, though the polynomials are of quite high order(s) on some term(s).

That said, another point is that given the polynomial terms $$d$$ and $$\kappa$$ and the $$\log N / \epsilon$$, if $$N$$ is relatively small and $$d$$ and $$\kappa$$ are large, the latter are going to have a greater influence on the complexity.

if a high-dimensional matrix has $n$ columns $\gg m$ rows, is it still high-dimensional if you transpose it, causing $n \ll m$? – develarist – 2019-11-19T00:20:46.490

Yes, in the data model, we think of features as dimensions of the data; even if you transpose it the features are still the features. – Greenstick – 2019-11-19T01:11:05.813