There are many techniques for training an RL agent without explicitly interacting with an environment, some of which are cited in the paper you linked. Heck, even using experience replay like in the foundational DQN paper is a way of doing this. However, while many models utilize some sort of pre-training for the sake of safety or speed, there are a couple of reasons why an environment is also used whenever possible.
Eventually, your RL agent will be placed in an environment to take its own actions. This is why we train RL agents. I'm assuming that, per your question, learning does not happen during this phase.
Maybe your agent encounters a novel situation
Hopefully, the experience your agent learns from is extensive enough to include every possible state-action pair $(s,a)$ that your agent will ever encounter. If it isn't, your agent won't have learned about these situations, and it will always perform suboptimally in them. This lack of coverage over the state-action space could be caused by stochasticity or nonstationarity in the environment.
Maybe the teacher isn't perfect
If you don't allow your agent to learn from its own experience, it will only ever perform as well as the agent that collected the demonstration data. That's an upper bound on performance that we have no reason to set for ourselves.