I want to train a reinforcement learning agent in an environment with parameters (for example, the wind speed, sun irradiation, etc.) that change over time. I have recorded a limited amount of data for these time series.
Should the RL agent be trained in an environment, which replays the recorded time series over and over, or should I model the time series with a generative model first and train the agent in an environment with these synthetic time series?
On the one hand, I think the RL algorithm will perform better with the synthetic data, because there are more diverse trajectories. On the other hand, I don't really have more data, because it is modelled after the same data the RL algorithm could learn from in the first place.
Are there any papers that elaborate on this topic?