Supervised learning is a statistical process that draws from a discrete set of examples, and therefore variance is to be expected. (Recall that variance is normalized and standard-deviation is not.) In the case of stochastic gradient descent, which exhibits improvements in training speed and reliability under many conditions, pseudo random factors are introduced deliberately, also affecting variance in accuracy measurements.

However, the threshold of acceptable accuracy variance is not the primary metric of interest to system architects and corporate stakeholders. It is the **maximum** theoretical and empirical inaccuracy metrics that are of primary concern.

In the PAC (probably approximately correct) framework, the required accuracy threshold is defined by $1 - \epsilon$. Data requirements to guarantee a minimum reliability, $1 - \delta$ for particular types of problems can be determined for specific types of training objectives and for a given $\epsilon$.

One could theoretically extend the PAC framework to consider mean, variance, and dimensions of skew, but it would not be of much interest to architects and stakeholders who are accustomed to communicating assurances in the forms like, "We know that with the data we have for training we can achieve 99.1% accuracy 89.3% of the time."

Are you using dropout by any chance? – razvanc92 – 2020-01-17T12:52:50.050