Why the sudden fascination with tensors?



I've noticed lately that a lot of people are developing tensor equivalents of many methods (tensor factorization, tensor kernels, tensors for topic modeling, etc) I'm wondering, why is the world suddenly fascinated with tensors? Are there recent papers/ standard results that are particularly surprising, that brought about this? Is it computationally a lot cheaper than previously suspected?

I'm not being glib, I sincerely am interested, and if there are any pointers to papers about this, I'd love to read them.


Posted 2016-02-23T09:38:31.690

Reputation: 698

21It seems like the only retaining feature that "big data tensors" share with the usual mathematical definition is that they are multidimensional arrays. So I'd say that big data tensors are a marketable way of saying "multidimensional array," because I highly doubt that machine learning people will care about either the symmetries or transformation laws that the usual tensors of mathematics and physics enjoy, especially their usefulness in forming coordinate free equations. – Alex R. – 2016-02-23T19:00:21.853

2@AlexR. without invariance to transformations there are no tensors – Aksakal – 2016-02-23T21:43:50.013

@Aksakal I can't tell whether or not you agree with Alex R. Do you agree that, as Alex R. suggests, the word "tensor" is often misused and that "multidimensional array" would usually be a more appropriate term (in machine learning papers)? – littleO – 2016-02-24T08:12:41.230

1Putting on my mathematical hat I can say that there is no intrinsic symmetry to a mathematical tensor. Further, they are another way to say 'multidimensional array'. One could vote for using the word tensor over using the phrase multidimensional array simply on grounds of simplicity. In particular if V is a n - dimensional vector space, one can identify $V \otimes V$ with n by n matrices. – aginensky – 2016-02-24T14:47:28.017

1@aginensky, I'm not a mathematician, but in physics tensors are different from array, they do have certain constraints that arrays don't have. Some tensors can be represented as arrays, and the operations are similar, but there are underlying symmetries in tensors. For instance, in mechanics of tensions your tensor should be invariant to the change in the coordinate system. Without these constraints there's no point in using tensors in physics. – Aksakal – 2016-02-24T18:02:42.343

2@Aksakal I'm certainly somewhat familiar with the use of tensors in physics. My point would be that the symmetries in physics tensors come from symmetry of the physics, not something essential in the defn of tensor. – aginensky – 2016-02-24T19:54:22.033

@aginensky Saying that $V$ is a "vector space" already assumes transformation properties that Alex and Aksakal are talking about. Think of a typical ML 3D array -- e.g. a set of 1000 of 600x400 video frames. In what sense is that a "tensor"? Sure, if $V$, $W$, and $U$ are 1000-, 600-, and 400-dimensional vector spaces then an element of $V\otimes W \otimes U$ in a particular coordinate system can be represented with the same amount of numbers. But does it make sense to talk about vertical/horizontal pixels as vector spaces? Maybe it does, but it's not obvious. – amoeba – 2016-02-24T19:55:14.590

1@ amoeba- I'll make one more comment, feel free to reply and have the last word. The defn of a vector space makes no mention of symmetries. Like many mathematical objects , it has symmetries and one can study them. However they are not a part of the definition. For that matter a basis is not part of the definition of a vector space. So for example one can distinguish between a linear transformation and a matrix. The latter being a realization of a linear transformation wrt a specific basis. Btw, it's not always clear that the 'natural' basis is the correct one. For eg, consider pca. – aginensky – 2016-02-24T20:26:14.137

@amoeba, I haven't read the papers on tensors and video frames. However, if we're looking at two subsequent frames of the same objects shot on camera, I could argue that although the frame contents are certainly different, they represent the same object, hence there's got to be some invariance conditions on file contents. Though whether they are tensor relationships I'm not sure. – Aksakal – 2016-02-24T21:20:20.670

3@aginensky If a tensor were nothing more than a multidimensional array, then why do the definitions of tensors found in math textbooks sound so complicated? From Wikipedia: "The numbers in the multidimensional array are known as the scalar components of the tensor... Just as the components of a vector change when we change the basis of the vector space, the components of a tensor also change under such a transformation. Each tensor comes equipped with a transformation law that details how the components of the tensor respond to a change of basis." In math, a tensor is not just an array. – littleO – 2016-02-25T00:55:47.470

@amoeba, I updated my answer with an example to show what diffs a tensor – Aksakal – 2016-02-25T03:30:39.437


Just some general thoughts on this discussion: I think that, as with vectors and matrices, the actual application often becomes a much-simplified instantiation of much richer theory. I am reading this paper in more depth: http://epubs.siam.org/doi/abs/10.1137/07070111X?journalCode=siread and one thing that is really impressing me is that the "representational" tools for matrices (eigenvalue and singular value decompositions) have interesting generalizations in higher orders. I'm sure there are many more beautiful properties as well, beyond just a nice container for more indices. :)

– whyyes – 2016-02-25T15:40:48.257

(FYI: The meaning of tensors in the neural network community)

– Franck Dernoncourt – 2016-09-04T13:15:57.613

@aginensky "the symmetries in physics tensors come from symmetries in the physics, not something essential in the defn of tensor" - this is completely false wrt the transformation properties that tensors enjoy wrt a basis. That is a key ingredient in the mathematical definition of a tensor, independent of any physical application. Just as a matrix represents a linear map, a multi-dimensional array can represent a tensor, but it is not the tensor itself. – silvascientist – 2017-07-03T21:28:31.263

@silvascientist - please read "The unreasonable effectiveness of mathematics in physics". If after that you are still in the Potter Stewart school of definition of tensors, I'm okay with that. Allow me to suggest that I am not unfamiliar with the mathematical properties of mathematical tensors. – aginensky – 2017-07-04T02:12:08.157

@aginensky "the Potter Stewart school of definition of tensors" - what, that a tensor is defined to be a thing which transforms according to the rules of tensors? Hardly. There are several very precise ways to define tensors, all of them giving rise to equivalent notions, but probably the simplest definition that I would go with is that a tensor is simply a multilinear scalar function of several arguments in the vector space and the dual space. Given a basis, we can represent the tensor by a multidimensional array, which can express the action of the tensor by contraction with the vector(s). – silvascientist – 2017-07-04T05:01:16.047

@aginensky The point is that, absent the special properties that are expected of a tensor, a multidimensional array is really just a multidimensional array. – silvascientist – 2017-07-04T05:02:02.397

@AlexR. I agree that tensors in TensorFlow or similar frameworks are multidimensional arrays. they don't possess the transformation invariance of tensors, at least directly. at the same time, indirectly they must "support" the invariance in the weak broad sense: AI must be able to recognize the letter on the picture regardless of the angle and orientation of the frame. however, i would say that this property is only maintained by the whole thing, not by the "tensor" that is passed between the vertices of the TensorFlow, which is just an array – Aksakal – 2017-09-26T16:10:25.453

@silvascientist, I would argue that the tensors are made to have these characteristics of invariance because they were used in physics. so, yes, the tensors as we define them do have the inavarince even outside the physical context, but it is by design that came from physics applications – Aksakal – 2017-09-26T16:12:05.320



Tensors often offer more natural representations of data, e.g., consider video, which consists of obviously correlated images over time. You can turn this into a matrix, but it's just not natural or intuitive (what does a factorization of some matrix-representation of video mean?).

Tensors are trending for several reasons:

  • our understanding of multilinear algebra is improving rapidly, specifically in various types of factorizations, which in turn helps us to identify new potential applications (e.g., multiway component analysis)
  • software tools are emerging (e.g., Tensorlab) and are being welcomed
  • Big Data applications can often be solved using tensors, for example recommender systems, and Big Data itself is hot
  • increases in computational power, as some tensor operations can be hefty (this is also one of the major reasons why deep learning is so popular now)

Marc Claesen

Posted 2016-02-23T09:38:31.690

Reputation: 13 997

9On the computational power part: I think the most important is that linear algebra can be very fast on GPUs, and lately they have gotten bigger and faster memories, that is the biggest limitation when processing large data. – Davidmh – 2016-02-23T12:17:38.673


Marc Claesen's answer is a good one. David Dunson, Distinguished Professor of Statistics at Duke, has been one of the key exponents of tensor-based approaches to modeling as in this presentation, Bayesian Tensor Regression. https://icerm.brown.edu/materials/Slides/sp-f12-w1/Nonparametric_Bayes_tensor_factorizations_for_big_data_]_David_B._Dunson,_Duke_University.pdf

– DJohnson – 2016-02-23T14:36:01.907

As mentioned by David, Tensor algorithms often lend themselves well to parallelism, which hardware (such as GPU accelerators) are increasingly getting better at. – Thomas Russell – 2016-02-23T15:15:19.860

1I assumed that the better memory/CPU capabilities were playing a part, but the very recent burst of attention was interesting; I think it must be because of a lot of recent surprising successes with recommender systems, and perhaps also kernels for SVMs, etc. Thanks for the links! great places to start learning about this stuff... – whyyes – 2016-02-24T07:20:54.183

4If you store a video as a multidimensional array, I don't see how this multidimensional array would have any of the invariance properties a tensor is supposed to have. It doesn't seem like the word "tensor" is appropriate in this example. – littleO – 2016-02-24T08:45:40.530

@Shaktal, I don't see how tensors are especially suited for parallel processing. Tensor operations are similar to mutidimensional array ops, and those are not always easy to parallelize – Aksakal – 2016-02-24T18:00:00.343


I think your question should be matched with an answer that is equally free flowing and open minded as the question itself. So, here they are my two analogies.

First, unless you're a pure mathematician, you were probably taught univariate probabilities and statistics first. For instance, most likely your first OLS example was probably on a model like this: $$y_i=a+bx_i+e_i$$ Most likely, you went through deriving the estimates through actually minimizing the sum of least squares: $$TSS=\sum_i(y_i-\bar a-\bar b x_i)^2$$ Then you write the FOCs for parameters and get the solution: $$\frac{\partial TTS}{\partial \bar a}=0$$

Then later you're told that there's an easier way of doing this with vector (matrix) notation: $$y=Xb+e$$

and the TTS becomes: $$TTS=(y-X\bar b)'(y-X\bar b)$$

The FOCs are: $$2X'(y-X\bar b)=0$$

And the solution is $$\bar b=(X'X)^{-1}X'y$$

If you're good at linear algebra, you'll stick to the second approach once you've learned it, because it's actually easier than writing down all the sums in the first approach, especially once you get into multivariate statistics.

Hence my analogy is that moving to tensors from matrices is similar to moving from vectors to matrices: if you know tensors some things will look easier this way.

Second, where do the tensors come from? I'm not sure about the whole history of this thing, but I learned them in theoretical mechanics. Certainly, we had a course on tensors, but I didn't understand what was the deal with all these fancy ways to swap indices in that math course. It all started to make sense in the context of studying tension forces.

So, in physics they also start with a simple example of pressure defined as force per unit area, hence: $$F=p\cdot dS$$ This means you can calculate the force vector $F$ by multiplying the pressure $p$ (scalar) by the unit of area $dS$ (normal vector). That is when we have only one infinite plane surface. In this case there's just one perpendicular force. A large balloon would be good example.

However, if you're studying tension inside materials, you are dealing with all possible directions and surfaces. In this case you have forces on any given surface pulling or pushing in all directions, not only perpendicular ones. Some surfaces are torn apart by tangential forces "sideways" etc. So, your equation becomes: $$F=P\cdot dS$$ The force is still a vector $F$ and the surface area is still represented by its normal vector $dS$, but $P$ is a tensor now, not a scalar.

Ok, a scalar and a vector are also tensors :)

Another place where tensors show up naturally is covariance or correlation matrices. Just think of this: how to transform once correlation matrix $C_0$ to another one $C_1$? You realize we can't just do it this way: $$C_\theta(i,j)=C_0(i,j)+ \theta(C_1(i,j)-C_0(i,j)),$$ where $\theta\in[0,1]$ because we need to keep all $C_\theta$ positive semi-definite.

So, we'd have to find the path $\delta C_\theta$ such that $C_1=C_0+\int_\theta\delta C_\theta$, where $\delta C_\theta$ is a small disturbance to a matrix. There are many different paths, and we could search for the shortest ones. That's how we get into Riemannian geometry, manifolds, and... tensors.

UPDATE: what's tensor, anyway?

@amoeba and others got into a lively discussion of the meaning of tensor and whether it's the same as an array. So, I thought an example is in order.

Say, we go to a bazaar to buy groceries, and there are two merchant dudes, $d_1$ and $d_2$. We noticed that if we pay $x_1$ dollars to $d_1$ and $x_2$ dollars to $d_2$ then $d_1$ sells us $y_1=2x_1-x_2$ pounds of apples, and $d_2$ sells us $y_2=-0.5x_1+2x_2$ oranges. For instance, if we pay both 1 dollar, i.e. $x_1=x_2=1$, then we must get 1 pound of apples and 1.5 of oranges.

We can express this relation in the form of a matrix $P$:

 2   -1
-0.5  2 

Then the merchants produce this much apples and oranges if we pay them $x$ dollars: $$y=Px$$

This works exactly like a matrix by vector multiplication.

Now, let's say instead of buying the goods from these merchants separately, we declare that there are two spending bundles we utilize. We either pay both 0.71 dollars, or we pay $d_1$ 0.71 dollars and demand 0.71 dollars from $d_2$ back. Like in the initial case, we go to a bazaar and spend $z_1$ on the bundle one and $z_2$ on the bundle 2.

So, let's look at an example where we spend just $z_1=2$ on bundle 1. In this case, the first merchant gets $x_1=1$ dollars, and the second merchant gets the same $x_2=1$. Hence, we must get the same amounts of produce like in the example above, aren't we?

Maybe, maybe not. You noticed that $P$ matrix is not diagonal. This indicates that for some reason how much one merchant charges for his produce depends also on how much we paid the other merchant. They must get an idea of how much pay them, maybe through roumors? In this case, if we start buying in bundles they'll know for sure how much we pay each of them, because we declare our bundles to the bazaar. In this case, how do we know that the $P$ matrix should stay the same?

Maybe with full information of our payments on the market the pricing formulae would change too! This will change our matrix $P$, and there's no way to say how exactly.

This is where we enter tensors. Essentially, with tensors we say that the calculations do not change when we start trading in bundles instead of directly with each merchant. That's the constraint, that will impose transformation rules on $P$, which we'll call a tensor.

Particularly we may notice that we have an orthonormal basis $\bar d_1,\bar d_2$, where $d_i$ means a payment of 1 dollar to a merchant $i$ and nothing to the other. We may also notice that the bundles also form an orthonormal basis $\bar d_1',\bar d_2'$, which is also a simple rotation of the first basis by 45 degrees counterclockwise. It's also a PC decomposition of the first basis. hence, we are saying that switching to the bundles is simple a change of coordinates, and it should not change the calculations. Note, that this an outside constraint that we imposed on the model. It didn't come from pure math properties of matrices.

Now, our shopping can be expressed as a vector $x=x_1 \bar d_1+x_2\bar d_2$. The vectors are tensors too, btw. The tensor is interesting: it can be represented as $$P=\sum_{ij}p_{ij}\bar d_i\bar d_j$$, and the groceries as $y=y_1 \bar d_1+y_2 \bar d_2$. With groceries $y_i$ means pound of produce from the merchant $i$, not the dollars paid.

Now, when we changed the coordinates to bundles the tensor equation stays the same: $$y=Pz$$

That's nice, but the payment vectors are now in the different basis: $$z=z_1 \bar d_1'+z_2\bar d_2'$$, while we may keep the produce vectors in the old basis $y=y_1 \bar d_1+y_2 \bar d_2$. The tensor changes too:$$P=\sum_{ij}p_{ij}'\bar d_i'\bar d_j'$$. It's easy to derive how the tensor must be transformed, it's going to be $PA$, where the rotation matrix is defined as $\bar d'=A\bar d$. In our case it's the coefficient of the bundle.

We can work out the formulae for tensor transformation, and they'll yield the same result as in the examples with $x_1=x_2=1$ and $z_1=0.71,z_2=0$.


Posted 2016-02-23T09:38:31.690

Reputation: 29 992

1I got confused around here: So, let's look at an example where we spend just z1=1.42 on bundle 1. In this case, the first merchant gets x1=1 dollars, and the second merchant gets the same x2=1. Earlier you say that first bundle is that we pay both 0.71 dollars. So spending 1.42 on the first bundle should get 0.71 each and not 1, no? – amoeba – 2016-02-25T10:32:00.153

@ameba, the idea's that a bundle 1 is $\bar d_1/ \sqrt 2+\bar d_2/ \sqrt 2$, so with $\sqrt 2$ bundle 1 you get $\bar d_1+\bar d_2$, i.e. 1\$ each – Aksakal – 2016-02-25T14:44:42.643

1@Aksakal, I know this discussion is quite old, but I don't get that either (although I was really trying to). Where does that idea that a bundle 1 is $\bar d_1/ \sqrt 2+\bar d_2/ \sqrt 2$ come from? Could you elaborate? How is that when you pay 1.42 for the bundle both merchants get 1? – Matek – 2016-09-14T08:16:43.147

@Aksakal This is great, thanks! I think you have a typo on the very last line, where you say x1 = x2 = 1 (correct) and z1 = 0.71, z2 = 0. Presuming I understood everything correctly, z1 should be 1.42 (or 1.41, which is slightly closer to 2^0.5). – Mike Williamson – 2017-08-03T02:14:36.813


This is not an answer to your question, but an extended comment on the issue that has been raised here in comments by different people, namely: are machine learning "tensors" the same thing as tensors in mathematics?

Now, according to the Cichoki 2014, Era of Big Data Processing: A New Approach via Tensor Networks and Tensor Decompositions, and Cichoki et al. 2014, Tensor Decompositions for Signal Processing Applications,

A higher-order tensor can be interpreted as a multiway array, [...]

A tensor can be thought of as a multi-index numerical array, [...]

Tensors (i.e., multi-way arrays) [...]

So called tensors in machine learning

So in machine learning / data processing a tensor appears to be simply defined as a multidimensional numerical array. An example of such a 3D tensor would be $1000$ video frames of $640\times 480$ size. A usual $n\times p$ data matrix is an example of a 2D tensor according to this definition.

This is not how tensors are defined in mathematics and physics!

A tensor can be defined as a multidimensional array obeying certain transformation laws under the change of coordinates (see Wikipedia or the first sentence in MathWorld article). A better but equivalent definition (see Wikipedia) says that a tensor on vector space $V$ is an element of $V\otimes\ldots\otimes V^*$. Note that this means that, when represented as multidimensional arrays, tensors are of size $p\times p$ or $p\times p\times p$ etc., where $p$ is the dimensionality of $V$.

All tensors well-known in physics are like that: inertia tensor in mechanics is $3\times 3$, electromagnetic tensor in special relativity is $4\times 4$, Riemann curvature tensor in general relativity is $4\times 4\times 4\times 4$. Curvature and electromagnetic tensors are actually tensor fields, which are sections of tensor bundles (see e.g. here but it gets technical), but all of that is defined over a vector space $V$.

Of course one can construct a tensor product $V\otimes W$ of an $p$-dimensional $V$ and $q$-dimensional $W$ but its elements are usually not called "tensors", as stated e.g. here on Wikipedia:

In principle, one could define a "tensor" simply to be an element of any tensor product. However, the mathematics literature usually reserves the term tensor for an element of a tensor product of a single vector space $V$ and its dual, as above.

One example of a real tensor in statistics would be a covariance matrix. It is $p\times p$ and transforms in a particular way when the coordinate system in the $p$-dimensional feature space $V$ is changed. It is a tensor. But a $n\times p$ data matrix $X$ is not.

But can we at least think of $X$ as an element of tensor product $W\otimes V$, where $W$ is $n$-dimensional and $V$ is $p$-dimensional? For concreteness, let rows in $X$ correspond to people (subjects) and columns to some measurements (features). A change of coordinates in $V$ corresponds to linear transformation of features, and this is done in statistics all the time (think of PCA). But a change of coordinates in $W$ does not seem to correspond to anything meaningful (and I urge anybody who has a counter-example to let me know in the comments). So it does not seem that there is anything gained by considering $X$ as an element of $W\otimes V$.

And indeed, the common notation is to write $X\in\mathbb R^{n\times p}$, where $R^{n\times p}$ is a set of all $n\times p$ matrices (which, by the way, are defined as rectangular arrays of numbers, without any assumed transformation properties).

My conclusion is: (a) machine learning tensors are not math/physics tensors, and (b) it is mostly not useful to see them as elements of tensor products either.

Instead, they are multidimensional generalizations of matrices. Unfortunately, there is no established mathematical term for that, so it seems that this new meaning of "tensor" is now here to stay.


Posted 2016-02-23T09:38:31.690

Reputation: 47 806

17I am a pure mathematician, and this is a very good answer. In particular, the example of a covariance matrix is an excellent way to understand the "transformation properties" or "symmetries" that seemed to cause confusion above. If you change coordinates on your $p$-dimensional feature space, the covariance matrix transforms in a particular and possibly surprising way; if you did the more naive transformation on your covariances you would end up with incorrect results. – Tom Church – 2016-02-25T04:08:07.697

10Thanks, @Tom, I appreciate that you registered on CrossValidated to leave this comment. It has been a long time since I was studying differential geometry so I am happy if somebody confirms what I wrote. It is a pity that there is no established term in mathematics for "multidimensional matrices"; it seems that "tensor" is going to stick in machine learning community as a term for that. How do you think one should rather call it though? The best thing that comes to my mind is $n$-matrices (e.g. $3$-matrix to refer to a video object), somewhat analogously to $n$-categories. – amoeba – 2016-02-25T10:25:02.063

4@amoeba, in programming the multidemensional matrices are usually called arrays, but some languages such as MATLAB would call them matrices. For instance, in FORTRAN the arrays can have more than 2 dimensions. In languages like C/C++/Java the arrays are one dimensional, but you can have arrays of arrays, making them work like multidimensional arrays too. MATLAB supports 3 or more dimensional arrays in the syntax. – Aksakal – 2016-02-25T15:17:53.090

@amoeba I studied Riemannian geometry in school, and will second this answer as accurate and enjoyable. – Matthew Drury – 2016-02-25T16:06:03.563

@amoeba, on people by features, I see how this could work. Let's say the features $v_j$ are skin, hair and eye colors: $v_1,v_2,v_3$, and people are $w_i$, using your notation. Let's say we are modeling incidents of cancers in people $i$, i.e. $y_i$ is whether a person $i$ got skin cancer or not. So, we can form a basis $e_i$, where $e_i=1$ if it's person $i$, and zero otherwise. So, we can write the tensor equation now: $y=Px$, where $P$ is a tensor. Now, you see where I'm going: we can turn the basis, because it's likely that people with fair skins have more skin cancer. – Aksakal – 2016-02-25T16:22:55.167

If you're looking for a mathematical name for '$d_1\times\dots\times d_n$-matrix, they are often defined to be multilinear maps $k^{d_1}\times\dots\times k^{d_n}\to k$; i.e., elements of the dual space $(k^{d_1}\otimes\dots\otimes k^{d_n})^*$. I'm not sure if that perspective is meaningful when applied to Big Data, though. – John Gowers – 2016-02-27T13:09:12.513

In the two dimensional case, we might consider a matrix of people and measurements as a particular bilinear map in $(W\otimes V)^$. Now it certainly makes sense to add and scalar-multiply elements of $(W\otimes V)^$, as they are all $k$-valued functions. – John Gowers – 2016-02-27T13:12:04.267

1@Matthew I would contend this answer is not accurate. As prominent counterexamples consider (a) (statistics) interactions of categorical variables in regression, which are tensor products of spaces of different dimensions; (b) (math) tensor products of algebras used in group representation theory; (c) (physics) tensor products of spaces in quantum mechanics used to account for spin structures; (d) (math) the second fundamental form of Riemannian geometry. All of these rely on the kinds of tensors explicitly disallowed in this answer. – whuber – 2016-07-05T13:22:04.740

2@whuber I think there is some misunderstanding here. I am certainly aware that tensor product is an important mathematical operation and that one often encounters tensor products between vectors spaces of different dimensionality. I did not mean to say otherwise. What I meant here is the word "tensor" (as a noun!) is usually used more restrictively than an element of any arbitrary $V\otimes W$, but instead usually refers to an element of $V\otimes\ldots\otimes V^*$. I gave links to relevant definitions. I am not sure how any of your examples are counterexamples to that. – amoeba – 2016-07-05T13:32:16.603

1Thank you for that clarification, amoeba. I was reacting primarily to your assertion "I am not aware of any useful such tensor," in which you asserted there do not exist the kinds of examples I have provided. That these do exist and are important suggests we might consider taking a more generous view of the use of "tensor" in machine learning and other applications. – whuber – 2016-07-05T13:48:13.973

@whuber I see where the misunderstanding came from... I will clarify. In fact, this terminological point about what a "tensor (n.)" refers to, as opposed to "tensor (adj.), as in tensor product", is only of secondary importance in my answer. What is more important, in my mind, is the point I am making in (currently) the 3rd paragraph from the bottom (about whether a $n\times p$ data matrix is an element of a tensor product). I would appreciate your comments on or objections to it. – amoeba – 2016-07-05T13:51:33.323

3That is very interesting. I hope you will emphasize that point. But please take some care not to confuse a set with a vector space it determines, because the distinction is important in statistics. In particular (to pick up one of your examples), although a linear combination of people is meaningless, a linear combination of real-valued functions on a set of people is both meaningful and important. It's the key to solving linear regression, for instance. – whuber – 2016-07-05T13:56:32.273

1'My conclusion is: machine learning "tensors" are not actually tensors in any meaningful or useful way. Instead, they are multidimensional generalizations of matrices. Unfortunately, there is no established mathematical term for that.' Well, now there is an established mathematical term for that - tensors. They may not be the tensors you knew and loved. But then again, when I was a kid, vectors had arrows on them; then I got to grown-up math classes, and their vectors did not have arrows on them. – Mark L. Stone – 2016-09-03T12:41:33.030


Per T. Kolda, B, Bada,"Tensor Decompositions and Applications" SIAM Review 2009, http://epubs.siam.org/doi/pdf/10.1137/07070111X 'A tensor is a multidimensional array. More formally, an N-way or Nth-order tensor is an element of the tensor product of N vector spaces, each of which has its own coordinate system. This notion of tensors is not to be confused with tensors in physics and engineering (such as stress tensors),, which are generally referred to as tensor fields in mathematics "

– Mark L. Stone – 2016-09-03T20:31:57.890

@MarkL.Stone I have now edited my answer to change the wording in a couple of places, and I fully agree with the quote from Kolda & Bada that you found (thanks for that). I also agree that the new meaning of the word "tensor" is likely here to stay; I added this explicitly in the end. – amoeba – 2016-09-03T23:59:01.087

@whuber I have finally got around editing my answer to change the wording here and there, taking into account our conversation two months ago. Cheers. – amoeba – 2016-09-04T00:00:44.603

1If you have a linear transformation on features, you can do column operations on your array of data, so you have gained something from thinking of your data as $W \otimes V$ -- you're just used to expressing that thought in different terms. If $T$ is the linear transformation on $V$, then the column operations you do on the matrix is the same thing as applying the transformation $I_W \otimes T$. – Hurkyl – 2016-09-04T13:48:38.360

Fair enough, @Hurkyl, this is a good point. – amoeba – 2016-09-04T13:51:32.147


As someone who studies and builds neural networks and has repeatedly asked this question, I've come to the conclusion that we borrow useful aspects of tensor notation simply because they make derivation a lot easier and keep our gradients in their native shapes. The tensor chain rule is one of the most elegant derivation tools I have ever seen. Further tensor notations encourage computationally efficient simplifications that are simply nightmarish to find when using common extended versions of vector calculus.

In Vector/Matrix calculus for instance there are 4 types of matrix products (Hadamard, Kronecker, Ordinary, and Elementwise) but in tensor calculus there is only one type of multiplication yet it covers all matrix multiplications and more. If you want to be generous, interpret tensor to mean multi-dimensional array that we intend to use tensor based calculus to find derivatives for, not that the objects we are manipulating are tensors.

In all honesty we probably call our multi-dimensional arrays tensors because most machine learning experts don't care that much about adhering to the definitions of high level math or physics. The reality is we are just borrowing well developed Einstein Summation Conventions and Calculi which are typically used when describing tensors and don't want to say Einstein summation convention based calculus over and over again. Maybe one day we might develop a new set of notations and conventions that steal only what they need from tensor calculus specifically for analyzing neural networks, but as a young field that takes time.

James Ryland

Posted 2016-02-23T09:38:31.690

Reputation: 51

Please register &/or merge your accounts (you can find information on how to do this in the My Account section of our [help]), then you will be able to edit & comment on your own answers. – gung – 2017-07-04T01:17:53.797


Now I actually agree with most of the content of the other answers. But I'm going to play Devil's advocate on one point. Again, it will be free flowing, so apologies...

Google announced a program called Tensor Flow for deep learning. This made me wonder what was 'tensor' about deep learning, as I couldn't make the connection to the definitions I'd seen.

enter image description here

Deep learning models are all about transformation of elements from one space to another. E.g. if we consider two layers of some network you might write co-ordinate $i$ of a transformed variable $y$ as a nonlinear function of the previous layer, using the fancy summation notation:

$y_i = \sigma(\beta_i^j x_j)$

Now the idea is to chain together a bunch of such transformations in order to arrive at a useful representation of the original co-ordinates. So, for example, after the last transformation of an image a simple logistic regression will produce excellent classification accuracy; whereas on the raw image it would definitely not.

Now, the thing that seems to have been lost from sight is the invariance properties sought in a proper tensor. Particularly when the dimensions of transformed variables may be different from layer to layer. [E.g. some of the stuff I've seen on tensors makes no sense for non square Jacobians - I may be lacking some methods]

What has been retained is the notion of transformations of variables, and that certain representations of a vector may be more useful than others for particular tasks. Analogy being whether it makes more sense to tackle a problem in Cartesian or polar co-ordinates.

EDIT in response to @Aksakal:

The vector can't be perfectly preserved because of the changes in the numbers of coordinates. However, in some sense at least the useful information may be preserved under transformation. For example with PCA we may drop a co-ordinate, so we can't invert the transformation but the dimensionality reduction may be useful nonetheless. If all the successive transformations were invertible, you could map back from the penultimate layer to input space. As it is, I've only seen probabilistic models which enable that (RBMs) by sampling.


Posted 2016-02-23T09:38:31.690

Reputation: 2 657

1In the context of neural networks I had always assumed tensors were acting just as multidimensional arrays. Can you elaborate on how the invariance properties are aiding classification/representation? – whyyes – 2016-02-25T16:02:16.523

Maybe I wasn't clear above, but it seems to me - if the interpretation is correct - the goal of invariant properties has been dropped. What seems to have been kept is the idea of variable transformations. – conjectures – 2016-02-25T20:02:52.630

@conjectures, if you have a vector $\bar r$ in cartesian coordinates, then convert it to polar coordinates, the vector stays the same, i.e. it still point from the same point in the same direction. Are you saying that in machine learning the coordinate transformation changes the initial vector? – Aksakal – 2016-02-25T20:18:59.657

but isn't that a property of the transformation more than the tensor? At least with linear and element-wise type transformations, which seem more popular in neural nets, they are equally present with vectors and matrices; what are the added benefits of the tensors? – whyyes – 2016-02-26T09:47:06.907

1@conjectures, PCA is just a rotation and projection. It's like rotating N-dimensional space to PC basis, then projecting to sub-space. Tensors are used in similar situations in physics, e.g. when looking at forces on the surfaces inside bodies etc. – Aksakal – 2016-02-26T12:32:00.280

@Aksakal, not sure quite sure whether that's a point of agreement or not - but interesting nonetheless. – conjectures – 2016-02-26T13:59:49.413

@conjectures, I'm just saying the dimensionality reduction with PCA doesn't preclude application of tensors. I'm not saying that ML folks are doing it right, so it's just a general observation – Aksakal – 2016-02-26T14:04:08.967


Here is a lightly edited (for context) excerpt from Non-Negative Tensor Factorization with Applications to Statistics and Computer Vision, A. Shashua and T. Hazan which gets to the heart of why at least some people are fascinated with tensors.

Any n-dimensional problem can be represented in two dimensional form by concatenating dimensions. Thus for example, the problem of finding a non-negative low rank decomposition of a set of images is a 3-NTF (Non-negative Tensor Factorization), with the images forming the slices of a 3D cube, but can also be represented as an NMF (Non-negative Matrix Factorization) problem by vectorizing the images (images forming columns of a matrix).

There are two reasons why a matrix representation of a collection of images would not be appropriate:

  1. Spatial redundancy (pixels, not necessarily neighboring, having similar values) is lost in the vectorization thus we would expect a less efficient factorization, and
  2. An NMF decomposition is not unique therefore even if there exists a generative model (of local parts) the NMF would not necessarily move in that direction, which has been verified empirically by Chu, M., Diele, F., Plemmons, R., & Ragni, S. "Optimality, computation and interpretation of nonnegative matrix factorizations" SIAM Journal on Matrix Analysis, 2004. For example, invariant parts on the image set would tend to form ghosts in all the factors and contaminate the sparsity effect. An NTF is almost always unique thus we would expect the NTF scheme to move towards the generative model, and specifically not be influenced by invariant parts.

Mark L. Stone

Posted 2016-02-23T09:38:31.690

Reputation: 8 454


[EDIT] Just discovered the book by Peter McCullagh, Tensor Methods in Statistics.

Tensors display interest properties in unknown mixture identification in a signal (or an image), especially around the notion of the Canonical Polyadic (CP) tensor decomposition, see for instance Tensors: a Brief Introduction, P. Comon, 2014. The field is known under the name "blind source separation (BSS)":

Tensor decompositions are at the core of many Blind Source Separation (BSS) algorithms, either explicitly or implicitly. In particular, the Canonical Polyadic (CP) tensor decomposition plays a central role in identification of underdetermined mixtures. Despite some similarities, CP and Singular Value Decomposition (SVD) are quite different. More generally, tensors and matrices enjoy different properties, as pointed out in this brief introduction.

Some uniqueness results have been derived for third-order tensors recently: On the uniqueness of the canonical polyadic decomposition of third-order tensors (part 1, part 2), I. Domanov et al., 2013.

Tensor decompositions are nodaways often connected to sparse decompositions, for instance by imposing structure on the decomposition factors (orthogonality, Vandermonde, Hankel), and low rank, to accommodate with non-uniqueness.

With an increasing need for incomplete data analysis and determination of complex measurements from sensors arrays, tensors are increasingly used for matrix completion, latent variable analysis and source separation.

Additional note: apparently, the Canonical Polyadic decomposition is also equivalent to Waring decomposition of a homogeneous polynomial as a sum of powers of linear forms, with applications in system identification (block structured, parallel Wiener-Hammerstein or nonlinear state-space models).

Laurent Duval

Posted 2016-02-23T09:38:31.690

Reputation: 1 349


May I respecfully recommend my book: Kroonenberg, P.M. Applied Multiway Data Analysis and Smilde et al. Multiway Analysis. Applications in the Chemical Sciences (both Wiley). Of interest may also be my article: Kroonenberg, P.M. (2014). History of multiway component analysis and three-way correspondence analysis. In Blasius, J. and Greenacre, M.J. (Eds.). Visualization and verbalization of data (pp. 77–94). New York: Chapman & Hall/CRC. ISBN 9781466589803.

These references talk about multway data rather than tensors, but refer to the same research area.

P.M. Kroonenberg

Posted 2016-02-23T09:38:31.690

Reputation: 11


It is true that people in Machine Learning do not view tensors with the same care as mathematicians and physicians. Here is a paper that may clarify this discrepancy: Comon P., "Tensors: a brief introduction" IEEE Sig. Proc. Magazine, 31, May 2014


Posted 2016-02-23T09:38:31.690

Reputation: 1

5Is the distinction between a tensor in mathematics/physics and a tensor in machine learning really one of "care"? It seems that machine learning folks use "tensor" as a generic term for arrays of numbers (scalar, vector, matrix and arrays with 3 or more axes, e.g. in TensorFlow), while "tensor" in a math/physics context has a different meaning. Suggesting that the question is about "care" is, I think, to mischaracterize the usage as "incorrect" in the machine learning capacity, when in fact the machine learning context has no intention of precisely replicating the math/physics usage. – Sycorax – 2018-01-16T16:49:25.520