## Understanding the proof of theorem 2.1 from the paper "Efficient reductions for imitation learning"

2

I am trying to understand the proof of theorem 2.1 from this paper:

Ross, Stéphane, and Drew Bagnell. "Efficient reductions for imitation learning." Proceedings of the thirteenth international conference on artificial intelligence and statistics. 2010.

The cost-to-go is given as

$$J(\pi) = \sum_{t=1}^{T}\mathbb{E}_{s\,\sim\, d^t_{\pi}(s)}\left[C_\pi(s)\right].$$

In the paper they use $$\hat{\pi}=\pi$$ for the learned policy and $$\pi^*$$ for the expert policy.

In the derivation they write

$$J(\pi)\leq \sum_{t=1}^{T}\{ p_{t-1}\mathbb{E}_{s\, \sim \, d_t(s)}\left[C_\pi(s) \right]+(1-p_{t-1})\}$$ $$\leq \sum_{t=1}^{T}\{ p_{t-1}\mathbb{E}_{s\, \sim \, d_t(s)}\left[C_{\pi^*}(s) \right]+p_{t-1}{\ell_t(s,\pi)}+(1-p_{t-1})\},$$

in which $$p_{t-1}$$ is the probability of not not making an error with policy $$\pi$$ up to the time $$t-1$$. And $$\ell$$ is the surrogate 0-1 loss.

The following steps are easy to follow, but how did they come up with these steps?