I think your question should be matched with an answer that is equally free flowing and open minded as the question itself. So, here they are my two analogies.

First, unless you're a pure mathematician, you were probably taught univariate probabilities and statistics first. For instance, most likely your first OLS example was probably on a model like this:
$$y_i=a+bx_i+e_i$$
Most likely, you went through deriving the estimates through actually minimizing the sum of least squares:
$$TSS=\sum_i(y_i-\bar a-\bar b x_i)^2$$
Then you write the FOCs for parameters and get the solution:
$$\frac{\partial TTS}{\partial \bar a}=0$$

Then later you're told that there's an easier way of doing this with vector (matrix) notation:
$$y=Xb+e$$

and the TTS becomes:
$$TTS=(y-X\bar b)'(y-X\bar b)$$

The FOCs are:
$$2X'(y-X\bar b)=0$$

And the solution is
$$\bar b=(X'X)^{-1}X'y$$

If you're good at linear algebra, you'll stick to the second approach once you've learned it, because it's actually easier than writing down all the sums in the first approach, especially once you get into multivariate statistics.

Hence my analogy is that moving to tensors from matrices is similar to moving from vectors to matrices: if you know tensors some things will look easier this way.

Second, where do the tensors come from? I'm not sure about the whole history of this thing, but I learned them in theoretical mechanics. Certainly, we had a course on tensors, but I didn't understand what was the deal with all these fancy ways to swap indices in that math course. It all started to make sense in the context of studying tension forces.

So, in physics they also start with a simple example of pressure defined as force per unit area, hence:
$$F=p\cdot dS$$
This means you can calculate the force vector $F$ by multiplying the pressure $p$ (scalar) by the unit of area $dS$ (normal vector). That is when we have only one infinite plane surface. In this case there's just one perpendicular force. A large balloon would be good example.

However, if you're studying tension inside materials, you are dealing with all possible directions and surfaces. In this case you have forces on any given surface pulling or pushing in all directions, not only perpendicular ones. Some surfaces are torn apart by tangential forces "sideways" etc. So, your equation becomes:
$$F=P\cdot dS$$
The force is still a vector $F$ and the surface area is still represented by its normal vector $dS$, but $P$ is a tensor now, not a scalar.

Ok, a scalar and a vector are also tensors :)

Another place where tensors show up naturally is covariance or correlation matrices. Just think of this: how to transform once correlation matrix $C_0$ to another one $C_1$? You realize we can't just do it this way: $$C_\theta(i,j)=C_0(i,j)+ \theta(C_1(i,j)-C_0(i,j)),$$
where $\theta\in[0,1]$ because we need to keep all $C_\theta$ positive semi-definite.

So, we'd have to find the path $\delta C_\theta$ such that $C_1=C_0+\int_\theta\delta C_\theta$, where $\delta C_\theta$ is a small disturbance to a matrix. There are many different paths, and we could search for the shortest ones. That's how we get into Riemannian geometry, manifolds, and... tensors.

UPDATE: **what's tensor, anyway?**

@amoeba and others got into a lively discussion of the meaning of tensor and whether it's the same as an array. So, I thought an example is in order.

Say, we go to a bazaar to buy groceries, and there are two merchant dudes, $d_1$ and $d_2$. We *noticed* that if we pay $x_1$ dollars to $d_1$ and $x_2$ dollars to $d_2$ then $d_1$ sells us $y_1=2x_1-x_2$ pounds of apples, and $d_2$ sells us $y_2=-0.5x_1+2x_2$ oranges.
For instance, if we pay both 1 dollar, i.e. $x_1=x_2=1$, then we must get 1 pound of apples and 1.5 of oranges.

We can express this relation in the form of a matrix $P$:

```
2 -1
-0.5 2
```

Then the merchants produce this much apples and oranges if we pay them $x$ dollars:
$$y=Px$$

This works exactly like a matrix by vector multiplication.

Now, let's say instead of buying the goods from these merchants separately, we declare that there are two spending bundles we utilize. We either pay both 0.71 dollars, or we pay $d_1$ 0.71 dollars and demand 0.71 dollars from $d_2$ back.
Like in the initial case, we go to a bazaar and spend $z_1$ on the bundle one and $z_2$ on the bundle 2.

So, let's look at an example where we spend just $z_1=2$ on bundle 1. In this case, the first merchant gets $x_1=1$ dollars, and the second merchant gets the same $x_2=1$. Hence, we must get the same amounts of produce like in the example above, aren't we?

Maybe, maybe not. You noticed that $P$ matrix is not diagonal. This indicates that for some reason how much one merchant charges for his produce depends also on how much we paid the other merchant. They must get an idea of how much pay them, maybe through roumors? In this case, if we start buying in bundles they'll know for sure how much we pay each of them, because we declare our bundles to the bazaar. In this case, how do we know that the $P$ matrix should stay the same?

Maybe with full information of our payments on the market the pricing formulae would change too! This will change our matrix $P$, and there's no way to say how exactly.

This is where we enter tensors. Essentially, with tensors we say that the calculations do not change when we start trading in bundles instead of directly with each merchant. That's the constraint, that will impose transformation rules on $P$, which we'll call a tensor.

Particularly we may notice that we have an orthonormal basis $\bar d_1,\bar d_2$, where $d_i$ means a payment of 1 dollar to a merchant $i$ and nothing to the other. We may also notice that the bundles also form an orthonormal basis $\bar d_1',\bar d_2'$, which is also a simple rotation of the first basis by 45 degrees counterclockwise. It's also a PC decomposition of the first basis. hence, we are saying that switching to the bundles is simple a change of coordinates, and it should not change the calculations. Note, that this an outside constraint that we imposed on the model. It didn't come from pure math properties of matrices.

Now, our shopping can be expressed as a vector $x=x_1 \bar d_1+x_2\bar d_2$. The vectors are tensors too, btw. The tensor is interesting: it can be represented as $$P=\sum_{ij}p_{ij}\bar d_i\bar d_j$$, and the groceries as $y=y_1 \bar d_1+y_2 \bar d_2$. With groceries $y_i$ means pound of produce from the merchant $i$, not the dollars paid.

Now, when we changed the coordinates to bundles the tensor equation stays the same: $$y=Pz$$

That's nice, but the payment vectors are now in the different basis: $$z=z_1 \bar d_1'+z_2\bar d_2'$$, while we may keep the produce vectors in the old basis $y=y_1 \bar d_1+y_2 \bar d_2$. The tensor changes too:$$P=\sum_{ij}p_{ij}'\bar d_i'\bar d_j'$$. It's easy to derive how the tensor must be transformed, it's going to be $PA$, where the rotation matrix is defined as $\bar d'=A\bar d$. In our case it's the coefficient of the bundle.

We can work out the formulae for tensor transformation, and they'll yield the same result as in the examples with $x_1=x_2=1$ and $z_1=0.71,z_2=0$.

21It seems like the only retaining feature that "big data tensors" share with the usual mathematical definition is that they are multidimensional arrays. So I'd say that big data tensors are a marketable way of saying "multidimensional array," because I highly doubt that machine learning people will care about either the symmetries or transformation laws that the usual tensors of mathematics and physics enjoy, especially their usefulness in forming coordinate free equations. – Alex R. – 2016-02-23T19:00:21.853

2@AlexR. without invariance to transformations there are no tensors – Aksakal – 2016-02-23T21:43:50.013

@Aksakal I can't tell whether or not you agree with Alex R. Do you agree that, as Alex R. suggests, the word "tensor" is often misused and that "multidimensional array" would usually be a more appropriate term (in machine learning papers)? – littleO – 2016-02-24T08:12:41.230

1Putting on my mathematical hat I can say that there is no intrinsic symmetry to a mathematical tensor. Further, they are another way to say 'multidimensional array'. One could vote for using the word tensor over using the phrase multidimensional array simply on grounds of simplicity. In particular if V is a n - dimensional vector space, one can identify $V \otimes V$ with n by n matrices. – aginensky – 2016-02-24T14:47:28.017

1@aginensky, I'm not a mathematician, but in physics tensors are different from array, they do have certain constraints that arrays don't have. Some tensors can be represented as arrays, and the operations are similar, but there are underlying symmetries in tensors. For instance, in mechanics of tensions your tensor should be invariant to the change in the coordinate system. Without these constraints there's no point in using tensors in physics. – Aksakal – 2016-02-24T18:02:42.343

2@Aksakal I'm certainly somewhat familiar with the use of tensors in physics. My point would be that the symmetries in physics tensors come from symmetry of the physics, not something essential in the defn of tensor. – aginensky – 2016-02-24T19:54:22.033

@aginensky Saying that $V$ is a "vector space" already assumes transformation properties that Alex and Aksakal are talking about. Think of a typical ML 3D array -- e.g. a set of 1000 of 600x400 video frames. In what sense is that a "tensor"? Sure, if $V$, $W$, and $U$ are 1000-, 600-, and 400-dimensional vector spaces then an element of $V\otimes W \otimes U$ in a particular coordinate system can be represented with the same amount of numbers. But does it make sense to talk about vertical/horizontal pixels as vector spaces? Maybe it does, but it's not obvious. – amoeba – 2016-02-24T19:55:14.590

1@ amoeba- I'll make one more comment, feel free to reply and have the last word. The defn of a vector space makes no mention of symmetries. Like many mathematical objects , it has symmetries and one can study them. However they are not a part of the definition. For that matter a basis is not part of the definition of a vector space. So for example one can distinguish between a linear transformation and a matrix. The latter being a realization of a linear transformation wrt a specific basis. Btw, it's not always clear that the 'natural' basis is the correct one. For eg, consider pca. – aginensky – 2016-02-24T20:26:14.137

@amoeba, I haven't read the papers on tensors and video frames. However, if we're looking at two subsequent frames of the same objects shot on camera, I could argue that although the frame contents are certainly different, they represent the same object, hence there's got to be some invariance conditions on file contents. Though whether they are tensor relationships I'm not sure. – Aksakal – 2016-02-24T21:20:20.670

3@aginensky If a tensor were nothing more than a multidimensional array, then why do the definitions of tensors found in math textbooks sound so complicated? From Wikipedia: "The numbers in the multidimensional array are known as the scalar components of the tensor... Just as the components of a vector change when we change the basis of the vector space, the components of a tensor also change under such a transformation. Each tensor comes equipped with a transformation law that details how the components of the tensor respond to a change of basis." In math, a tensor is not just an array. – littleO – 2016-02-25T00:55:47.470

@amoeba, I updated my answer with an example to show what diffs a tensor – Aksakal – 2016-02-25T03:30:39.437

4

Just some general thoughts on this discussion: I think that, as with vectors and matrices, the actual application often becomes a much-simplified instantiation of much richer theory. I am reading this paper in more depth: http://epubs.siam.org/doi/abs/10.1137/07070111X?journalCode=siread and one thing that is really impressing me is that the "representational" tools for matrices (eigenvalue and singular value decompositions) have interesting generalizations in higher orders. I'm sure there are many more beautiful properties as well, beyond just a nice container for more indices. :)

– whyyes – 2016-02-25T15:40:48.257(FYI: The meaning of tensors in the neural network community)

– Franck Dernoncourt – 2016-09-04T13:15:57.613@aginensky "the symmetries in physics tensors come from symmetries in the physics, not something essential in the defn of tensor" - this is completely false wrt the transformation properties that tensors enjoy wrt a basis. That is a key ingredient in the

mathematicaldefinition of a tensor, independent of any physical application. Just as a matrixrepresentsa linear map, a multi-dimensional array canrepresenta tensor, but it is not the tensor itself. – silvascientist – 2017-07-03T21:28:31.263@silvascientist - please read "The unreasonable effectiveness of mathematics in physics". If after that you are still in the Potter Stewart school of definition of tensors, I'm okay with that. Allow me to suggest that I am not unfamiliar with the mathematical properties of mathematical tensors. – aginensky – 2017-07-04T02:12:08.157

@aginensky "the Potter Stewart school of definition of tensors" - what, that a tensor is defined to be a thing which transforms according to the rules of tensors? Hardly. There are several very precise ways to define tensors, all of them giving rise to equivalent notions, but probably the simplest definition that I would go with is that a tensor is simply a multilinear scalar function of several arguments in the vector space and the dual space. Given a basis, we can represent the tensor by a multidimensional array, which can express the action of the tensor by contraction with the vector(s). – silvascientist – 2017-07-04T05:01:16.047

@aginensky The point is that, absent the special properties that are expected of a tensor, a multidimensional array is really just a multidimensional array. – silvascientist – 2017-07-04T05:02:02.397

@AlexR. I agree that tensors in TensorFlow or similar frameworks are multidimensional arrays. they don't possess the transformation invariance of tensors, at least directly. at the same time, indirectly they must "support" the invariance in the weak broad sense: AI must be able to recognize the letter on the picture regardless of the angle and orientation of the frame. however, i would say that this property is only maintained by the whole thing, not by the "tensor" that is passed between the vertices of the TensorFlow, which is just an array – Aksakal – 2017-09-26T16:10:25.453

@silvascientist, I would argue that the tensors are made to have these characteristics of invariance because they were used in physics. so, yes, the tensors as we define them do have the inavarince even outside the physical context, but it is by design that came from physics applications – Aksakal – 2017-09-26T16:12:05.320