If I could only recomend one to you, it would be: The Elements of Statistical Learning and Prediction by Hastie, Tibshirani and Friedman. It provides the math/statistics behind a lot of commonly used techniques in data science.
For Bayesian Techniques, Bayesian Data Analysis by Gelman, Carlin, Stern, Dunson, Vehtari and Rubin is excellent.
Statistical Inference by Casella and Berger is a good graduate-level textbook on the theoretical foundation of statistics. This book does require a pretty high level of comfort with math (probability theory is based on measure theory, which is not trivial to understand).
With respect to data generating processes, I don't have a recommendation for a book. What I can say is that a good understanding of the assumptions of the techniques used and ensuring that the data was collected or generated in a manner that does not violate those assumptions goes a long way towards a good analysis.