Is there a limit of minimum error for a particular training dataset in artificial Neural Network?



In error-based learning using gradient descent, if I give you a training dataset, then can you find the minimum error after training? And the minimum error should be true for all architectures of a neural network? Consider you will use MSE for calculating the error. You can choose anything you want other than my specified condition. It's like no matter how you change your network you can never cross the limit.

bipul kalita

Posted 2018-10-28T09:37:14.010

Reputation: 109

1Yes it is probably possible using the 'Universal Approximation Function' by has quite involved mathematics, but apparently it roughly states that a NN can approximate a function within an error 'epsilon' – DuttaA – 2018-10-28T09:39:35.637



In some sense yes, in some sense no. The "error" word you are using is not defined in the question. The network can have two type of errors, quite different - generalization error and loss (training error). The lower bound of loss on the dataset obviously depend on the network architecture, because loss is value produced by network as a whole, and some network will be able to produce exactly zero loss (which is likely be overfitting, wich is bad).

The generalization error is how well network generalize form training dataset to potentially infinite (or very big) amount of data samples "in the wild", so generalization error can not be calculated precisely, only estimated. The lower bound on generalization error is studied in Statistical Learning. It always depend on the complexity of the classifier (in our case architecture of network). Statistical Learning theory produce estimations like VC generalization bound which connect generalization error with size of the dataset and complexity of the classifer (netwroks in our case) space. However for specific dataset it probably don't make sense to talk about generalization bound outside of context of complexity of classifier space, because if there is no restriction we can always construct bigger network which would have lesser generalization error (for example by pretraning bigger network on bigger dataset) To put it short you have to put restriction on complexity of network space to be able estimate minimum generalization error (and that probably wouldn't be useful anyway, because it wouldn't say anything about if gradient descent would reach that error)


Posted 2018-10-28T09:37:14.010

Reputation: 547