Is unsupervised disentanglement really impossible?

3

In Locatello et al's Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations he claims to prove unsupervised disentanglement is impossible.

His entire claim is founded on a theorem (proven in the appendix) that states in my own words:

Theorem: for any distribution $p(z)$ where each variable $z_i$ are independent of each other there exists an infinite number of transformations $\hat z = f(z)$ from $\Omega_z \rightarrow \Omega_z$ with distribution $q(\hat z$) such that all variables $\hat z_i$ are entangled/correlated and the distributions are equal ($q(\hat z) = p(z)$)

Here is the exact wording from the paper:

enter image description here

(I provide both because my misunderstanding may be stemmed from my perception of the theorem)

From here the authors explain the straightforward jump from this to that for any unsupervised learned disentangled latent space there will exist infinitely many entangled latent space with the exact same distribution.

I do not understand why this means its no longer disentangled? Just because an entangled representation exists, does not mean the disentangled is any less valid. We can still conduct inference of the variables independently because they still follow that $p(z) = \prod_i p(z_i)$, so where does the impossibility come in?

mshlis

Posted 2019-08-12T00:38:56.740

Reputation: 1 845

This may be better posted in mathematics stack, however it look like Neil is on the right track – hisairnessag3 – 2019-08-12T11:13:58.463

Answers

1

The impossibility is referring how to learn the disentangled representations from the observed distribution or to know whether you have a disentangled representation in the first place.

Basically, an unsupervised learning agent tasked with learning a disentangled transformation of some features $\mathbf{z}$ needs to infer a set of features from the data which are not entangled, but the supplied data will always have many equally valid entangled solutions - valid from the point of view of describing the distribution of $\mathbf{z}$ accurately.

An analogy would be "I have observed the value 50, and know it is the sum of 3 numbers. What are those numbers?". Whilst the correct answer exists, and it is possible to guess it, it cannot be inferred from the supplied information.

The part of the proof you quote shows that the multiple equivalent entangled feature sets exist, and theoretically cannot be separated from a "true" disentangled feature set on the basis of knowing the distribution. Once you accept this, it is indeed just a short hop in logic to say that the disentangled features are not learnable - there is no way for a learning system to differentiate between the entangled and disentangled features that explain the distribution, and a large (infinite) set of valid entangled features are guaranteed to exist, which will confound attempts to find a perfect solution.

It is worth noting that the impossibility refers to learning perfect solutions, and that the proof does not rule out useful or practical approximate solutions, or solutions that augment unsupervised learning by applying some additional rules or a semi-supervised approach.

Neil Slater

Posted 2019-08-12T00:38:56.740

Reputation: 14 632

I understand what you say about the models inability to differentiate the disentangled from the multitude of entangled, but why should that matter if the distribution follows $\prod_i p(z_i)$? It will still hold all the properties of independent variables even if there exists infinite other interpretations – mshlis – 2019-08-12T11:26:07.770

@mshlis: What do you mean by "matter"? Yes it should be possible to use even heavily entangled features for other tasks, such as inference. But the stated goal is to have disentangled features. One thing you cannot easily use entangled features for is explanatory models (because your explanation becomes more complex). Another thing is control or intervention, assuming you can influence at least one of the disentangled features that you extract. – Neil Slater – 2019-08-12T14:03:59.830

my point of "matter" means: even if there exists an entangled representation, there also exists a disentangled representation. So im not talking about learning the interpretation of such a disentanglement, but more that it is provably interchangeable – mshlis – 2019-08-12T14:44:57.990

@mshlis: It is interchangeable for some purposes, not others. This isn't your main question though. Your question is "Is unsupervised disentanglement really impossible?" The answer is yes. If you want to ask "What is disentanglement useful for?" or "Do I need to care about whether my features are entangled?" that would be a new and different question IMO (and for the latter question you need to give your goals with the data) – Neil Slater – 2019-08-12T15:36:44.040

If i take the embeddings of 2 inputs: $x_1 \rightarrow v_1, x_2 \rightarrow v_2$ and i can make a new vector $v_3$ such that it just is partially $v_1$ and $v_2$, and $v_3 \in \Omega_v$ wouldnt that mean its disentangled – mshlis – 2019-08-12T15:39:54.927

given my above comment, wouldnt it be fair they didnt prove disentanglement was impossible, just impossible to interpret – mshlis – 2019-08-12T15:40:48.643

@mshlis: They haven't shown that disentanglement is impossible. They have shown that there is no way to create a learning process based purely on the data that guarantees that you will find the disentangled features. – Neil Slater – 2019-08-12T15:44:11.787

sorry didn’t mean to make it an answer clicked wrong thing on phone – mshlis – 2019-08-12T16:37:14.900

@mshlis: I am not sure I can do the maths well enough to explain in the way you need. I am just accepting the proof and trying to explain what it means intuitively . . . – Neil Slater – 2019-08-12T17:20:49.027

Let us continue this discussion in chat.

– mshlis – 2019-08-12T19:42:04.017