In machine learning tasks it is common to shuffle data and normalize it. The purpose of normalization is clear (for having same range of feature values). But, after struggling a lot, I did not find any valuable reason for shuffling data.
I have read this post here discussing when we need to shuffle data, but it is not obvious why we should shuffle the data. Furthermore, I have frequently seen in algorithms such as Adam or SGD where we need batch gradient descent (data should be separated to mini-batches and batch size has to be specified). It is vital according to this post to shuffle data for each epoch to have different data for each batch. So, perhaps the data is shuffled and more importantly changed.
Why do we do this?