The error can have different forms depending on the application. For example for a simple regression we often use the sum of squared deviations between the actual output $y_n$ for the input $x_n$ and the predicted output $\hat{y}(x_n)$ for the input $x_n$. The total loss $J_\text{Gauss}$ is then given as the sum over all squared errors (also known as Gaussian Loss)for each observation.

$$J_\text{Gauss}= \sum_{n=1}^N\left[y_n-\hat{y}(x_n)\right]^2$$

If we use absolute values instead of squares we obtain the Laplacian loss function $J_\text{Laplace}$, which is given by

$$J_\text{Laplace}=\sum_{n=1}^N\left|y_n-\hat{y}(x_n)\right|$$

If we rather try to compare two probability distributions $p(x)$ and $q(x)$ we use a unsymmetric distance meassure called Kullback-Leibler divergence

$$
{\displaystyle D_{\text{KL}}(P\parallel Q)=\int _{-\infty }^{\infty }p(x)\ln {\frac {p(x)}{q(x)}}\,dx}.
$$

For binary classification we can use the hinge-loss

$$J_\text{hinge}=\sum_{n=1}^N\max \{0, 1- t_n \hat{y}(x_n)\},$$

in which $t_n=+1$ if observation $x_n$ is from the postive class and $t_n=-1$ from the negative class.

For support vector regression the $\varepsilon$-insensitive loss $J_\varepsilon$ is used. It is defined by the following equation.

$$J_\varepsilon=\max\{0,|y_n-\hat{y}(x_n)|-\varepsilon\}$$

This loss acts like a threshold. It will only count something as an error if the error is larger then $\varepsilon$.

As you can see there are some meassures (see this Wikipedia article for loss functions used for classification) of error for comparing the predicted output and the observed output.

Hm, but each of the loss functions you describe works on the difference of $y_i$ and $\hat{y}(x_i)$. Even the log loss works on the difference, if you exponentiate it back to non-log probabilities. It seems the error is still just $y_i - \hat{y}(x_i)$ and the different loss functions only do different things with the ever same basic error.... ? Note, that I am using

lossanderrorfor different things. The loss is a function of the error. – lo tolmencre – 2019-07-29T13:49:54.677The Kullback-Leibler divergence does not use errors for different probability distributions. And if you are doing generative modelling you can try to use it for almost all problems. Additionally the hinge loss does not work with a difference of between prediction and true output. – MachineLearner – 2019-07-30T07:13:55.937