## Is (nearly) all data separable?

9

1

Suppose I have some data set with two classes. I could draw a decision boundary around each data point belonging to one of these classes, and hence, separate the data, like so:

Where the red lines are the decision boundaries around the data points belonging to the star class.

Obviously this model overfits really badly, but nevertheless, have I not shown that this data set is separable?

I ask because in an exercise book, a question asks "Is the above data set separable? If it is, is it linearly separable or non-linearly separable?"

I would say "Yes it is separable, but non-linearly separable."

No answers are provided, so I'm not sure, but I think my logic seems reasonable.

The only exception I see is when two data points belong to different classes, but have identical features. For instance, if one of the stars in the figure above was perfectly overlapping one of the circles. I suppose this is quite rare in practise. Hence, I ask, is nearly all data separable?

I lack citations right now, so comment rather than answer, but to me "separable" always means "linearly separable". I'd guess the exercise book has in mind that "nonlinearly separable" means something fuzzy like "cleanly" or "reasonably" separable, so that the example you give is not separable. – Ben Reiniger – 2020-01-02T03:05:59.257

1Can you provide the name and authors of that exercise book to see the broader context? – Sammy – 2020-01-04T13:53:12.063

Unfortunately I can't, the exercises are proprietary to my university and aren't published in a way you can access them. – Data – 2020-01-09T01:21:10.000

4

# TL;DR

Yes, with overfitting all data becomes (non-linearly) separable (as long as the points don't precisely overlap).

# Explanation

The problem with your argument is that you are using circles on a 2D plane, which is very difficult to learn. However, I think your argument can be made stronger with a decision-tree.

(0.2, 3.1)? --> yes -> star
\-> no  -> (1.2, 4.5)? --> yes -> circle
\-> no  -> (x1, x2)? --> yes ...
\-> no  ...


Decision trees are well accepted models, but note that they are non-linear models. With that, it is easy to argue that all data becomes separable.

However, the issue lies with overfitting. Because models like that become unstable with data points that were unseen before. So, just because the training data is separable, it doesn't mean that the models generated from it become any useful.

But then the word "separable" is completely useless, isn't it? – Ben Reiniger – 2020-01-04T18:18:22.467

@BenReiniger yes, that is why we prefer to look for data that is linearly separable. Or separable by a decision-tree up to 3 ply, or something like that. Once you constrain the separation conditions, then it becomes useful. – Bruno Lubascher – 2020-01-04T19:40:49.420

3

Having consulted my professor, the person that wrote the question from the exercise book featured in the OP, here is their perspective:

Groups of data points can always be separated. The exception is when two points are at the same location.

However, the thing to consider is whether or not your decision boundary can separate unseen data, generated by the same underlying distribution from which the training data came.

In the example shown in the question, the data is generated from a uniform random distribution. If we generate unseen data from the same distribution, you could draw a decision boundary anywhere, and your classifier would never perform significantly better than making random guesses when classifying this unseen data, e.g. using the outcome of a coin flip for classification.

So the classes from the example in the question are not separable.

I don't particularly like that this definition depends on an unknown underlying distribution; it seems like the "fuzzy" definition in my comment on OP. I would prefer Bruno's version, where an adjective is required (and often assumed "linear"). I would also suggest that "separable" somehow indicates a perfect generative process, as opposed to one with noise (see hard vs soft SVMs). I guess this term is probably doomed to inconsistent definition. – Ben Reiniger – 2020-01-09T00:00:21.383

1

Here is my stab at the answer: separation basially means that the types of cases are separated, but cases of the same type are not.

In your case I presume that the stars in your graph are of the same type, so they shouldn't be separated from one another, but conneted. In this case the data is not separable

If on the other hand you had eleven types of cases and each star in your graph would be of a separate type, your solution would be correct. In this case the data are separable but they are not linearly separable.

I like the answer @BrunoGL gave. Nevertheless the deision tree singles out every "star" case individually. The resulting overfitting is basically same as treating each star as a separate type and putting the together after in one class the classifiation (as the "non-cirle" class).

Hi and welcome to the site! With regards to your post: I am afraid I do not see how this answers the question "Is the above data set separable? If it is, is it linearly separable or non-linearly separable?". – Sammy – 2020-01-04T13:56:22.327

Hi @Sammy. Thanks for the comment. I edited to answer the question. – Nino Rode – 2020-01-04T17:25:00.593

But surely it's still trivial to draw a windy blob to contain precisely the stars...even a decision tree could be coerced into give connected decision boundaries. – Ben Reiniger – 2020-01-04T18:17:23.770

Yes, but you define the stars as something they are not (as non-circles), and not as somthing they are, not by their common traits. Basically you separete each star individually off from the circles. This is the conceptual essence of overfitting. Technically it is possible but it doesn't contribute to understanding or to further predictions. – Nino Rode – 2020-01-05T03:54:23.490