How can we prove this inequality, related to the generalization error, without using the Rademacher complexity?



This is an inequality on page 36 of the book Foundations of Machine Learning, but the author only states it without proof. $$ \mathbb{P}\left[\left|R(h)-\widehat{R}_{S}(h)\right|>\epsilon\right] \leq 4 \Pi_{\mathcal{H}}(2 m) \exp \left(-\frac{m \epsilon^{2}}{8}\right) $$

Here the growth function $\Pi_{\mathcal{F}}: \mathbb{N} \rightarrow \mathbb{N}$ for a hypothesis set $\mathcal{H}$ is defined by: $$ \forall m \in \mathbb{N}, \Pi_{\mathcal{F}}(m)=\max _{\left\{x_{1}, \ldots, x_{m}\right\} \subseteq X}\left|\left\{\left(h\left(x_{1}\right), \ldots, h\left(x_{m}\right)\right): h \in \mathcal{H}\right\}\right| $$

Given a hypothesis h $\in \mathcal{H},$ a target concept $c \in \mathcal{C}$ and an underlying distribution $\mathcal{D},$ the generalization error or risk of $h$ is defined by $$ R(h)=\underset{x \sim D}{\mathbb{P}}[h(x) \neq c(x)]=\underset{x \sim D}{\mathbb{E}}\left[1_{h(x) \neq c(x)}\right] $$ where $1_{\omega}$ is the indicator function of the event $\omega$.

And the empirical error or empirical risk of $h$ is defined $$ \widehat{R}_{S}(h)=\frac{1}{m} \sum_{i=1}^{m} 1_{h\left(x_{i}\right) \neq c\left(x_{i}\right)} $$

In the book, the author proves another inequality that differs from this one by only a constant using Rademacher complexity, but he says that the stated inequality can be proved without using Rademacher complexity. Does anyone know how to prove it?


Posted 2020-01-16T11:01:20.123

Reputation: 181

No answers