## How can we prove this inequality, related to the generalization error, without using the Rademacher complexity?

2

1

This is an inequality on page 36 of the book Foundations of Machine Learning, but the author only states it without proof. $$\mathbb{P}\left[\left|R(h)-\widehat{R}_{S}(h)\right|>\epsilon\right] \leq 4 \Pi_{\mathcal{H}}(2 m) \exp \left(-\frac{m \epsilon^{2}}{8}\right)$$

Here the growth function $$\Pi_{\mathcal{F}}: \mathbb{N} \rightarrow \mathbb{N}$$ for a hypothesis set $$\mathcal{H}$$ is defined by: $$\forall m \in \mathbb{N}, \Pi_{\mathcal{F}}(m)=\max _{\left\{x_{1}, \ldots, x_{m}\right\} \subseteq X}\left|\left\{\left(h\left(x_{1}\right), \ldots, h\left(x_{m}\right)\right): h \in \mathcal{H}\right\}\right|$$

Given a hypothesis h $$\in \mathcal{H},$$ a target concept $$c \in \mathcal{C}$$ and an underlying distribution $$\mathcal{D},$$ the generalization error or risk of $$h$$ is defined by $$R(h)=\underset{x \sim D}{\mathbb{P}}[h(x) \neq c(x)]=\underset{x \sim D}{\mathbb{E}}\left[1_{h(x) \neq c(x)}\right]$$ where $$1_{\omega}$$ is the indicator function of the event $$\omega$$.

And the empirical error or empirical risk of $$h$$ is defined $$\widehat{R}_{S}(h)=\frac{1}{m} \sum_{i=1}^{m} 1_{h\left(x_{i}\right) \neq c\left(x_{i}\right)}$$

In the book, the author proves another inequality that differs from this one by only a constant using Rademacher complexity, but he says that the stated inequality can be proved without using Rademacher complexity. Does anyone know how to prove it?