Equivariance and invariance are sometimes used interchangeably in common speech. They have ancient roots in maths and physics. As pointed out by @Xi'an, you can find previous uses (anterior to Convolutional Neural Networks) in the statistical literature, for instance on the notions of the invariant estimator and especially the Pitman estimator.
However, I would like to mention that it would be better if both terms keep separate meaning, as the prefix "in-" in invariant is privative (meaning "no variance" at all), while "equi-" in equivariant refers to "varying in a similar or equivalent proportion". In other words, one in- does not vary, the other equi- does.
Let us start from simple image features, and suppose that image $I$ has a unique maximum $m$ at spatial pixel location $(x_m,y_m)$, which is here the main classification feature. In other words: an image and all its translations are "the same".
An interesting property of classifiers is their ability to classify in the same manner some distorted versions $I'$ of $I$, for instance translations by all vectors $(u,v)$.
The maximum value $m'$ of $I'$ is invariant: $m'=m$: the value is the same. While its location will be at $(x'_m,y'_m)=(x_m-u,y_m-v)$, and is equivariant, meaning that is varies "equally" with the distortion.
The precise formulations given (in mathematical terms) for equivariance depend on the class of objects and transformations one considers: translation, rotation, scale, shear, shift, etc. So I prefer here to focus on the notion that is most often used in practice (I accept the blame from a theoretical stand-point).
Here, translations by vectors $(u,v)$ of the image (or some more generic actions) can be equipped with a structure of composition, like that of a group $G$ (here the group of translations). One specific $g$ denotes a specific element of the translation group (translational symmetry). A function or feature $f$ is invariant under the group of actions $G$ if for all images in a class, and for any $g$,
$$f(g(I)) = f(I)\,.$$
In other words: if you change the image by action $g$, the values for feature or function $f$ are the same.
It becomes equivariant if there exists another mathematical structure or action (often a group again) $G'$ that reflects
transformations (from $G$) in $I$ in a meaningful way. In other words, such that for each $g$, you have some (unique?) $g' \in G'$ such that
$$f(g(I)) = g'(f(I))\,.$$
In the above example on the group of translations, $g$ and $g'$ are the same (and hence $G'=G$): an integer translation of the image reflects as the exact same translation of the maximum location. This is sometimes refered to as "same-equivariance".
Another common definition is:
$$f(g(I)) = g(f(I))\,.$$
I however used potentially different $G$ and $G'$ because sometimes $f(.)$ and $g(.)$ do not lie in the same domain. This happens for instance in multivariate statistics (see e.g. Equivariance and invariance properties of multivariate quantile and related functions, and the role of standardisation).
But here, the uniqueness of the mapping between $g$ and $g'$ allows to get back to the original transformation $g$.
Often, people use the term invariance because the equivariance concept is unknown, or everybody else uses invariance, and equivariance would seem more pedantic.
For the record, other related notions (esp. in maths and physics) are termed covariance, contravariance, differential invariance.
In addition, translation-invariance, as least approximate, or in envelope, has been a quest for several signal and image processing tools. Notably, multi-rate (filter-banks) and multi-scale (wavelets or pyramids) transformations have been design in the past 25 years, for instance under the hood of shift-invariant, cycle-spinning, stationary, complex, dual-tree wavelet transforms (for a review on 2D wavelets, A panorama on multiscale geometric representations). The wavelets can absorb a few discrete scale variations. All theses (approximate) invariances often come with the price of redundancy in the number of transformed coefficients.
But they are more likely to yield shift-invariant, or shift-equivariant features.