Multidimensional scaling producing different results for different seeds

3

I took the data from here and wanted to play around with multidimensional scaling with this data. The data looks like this:

In particular, I want to plot the cities in a 2D space, and see how much it matches their real locations in a geographic map from just the information about how far they are from each other, without any explicit latitude and longitude information. This is my code:

import pandas as pd
import numpy as np
from sklearn import manifold
import matplotlib.pyplot as plt

mds = manifold.MDS(n_components=2, dissimilarity="precomputed", random_state=6)
results = mds.fit(data.values)

cities = data.columns
coords = results.embedding_

fig = plt.figure(figsize=(12,10))

plt.scatter(coords[:, 0], coords[:, 1])

for label, x, y in zip(cities, coords[:, 0], coords[:, 1]):
plt.annotate(
label,
xy = (x, y),
xytext = (-20, 20),
textcoords = 'offset points'
)
plt.show()


Most of the cities seem to be around the correct general location relative to each other, except a few infractions - Dublin is too far away from London, Istanbul is in the wrong location, etc. However, if I give a different random_state value, it produces a different "map". For example, random_state=1 produces the following map, where many of the cities do not seem to be around the correct general location relative to other cities:

What I don't understand is, dimensionality reduction methods are not supposed to have randomness associated with them, and thus should not give different results for different seeds. But it does here; so what does it mean?

The documentation of the sklearn.manifold.MDS function states that random_state is "the generator used to initialize the centers". So, in particular, I guess what I'm asking is, whatever initialization of the centres we choose, shouldn't all of them lead to one unique result?

I get a much more "accurate" map (to my eyes at least) by giving the following hyperparameter values:

mds = manifold.MDS(n_components=2, dissimilarity="euclidean", n_init=100, max_iter=1000, random_state=1)


1Distances are Euclidean and you should use this choice as a hyperparameter for dissimilarity. – Regi Mathew – 2020-01-02T05:24:39.450

0

I think the answer to your problem is to understand that euclidian distance matrices are too much to uncover the information. Basically there is a lot of underlying constraints to verify (n*(n-1)/2 constraints for 2n variables).

One way to solve that kind of problem in human acceptable time is to consider 'physical' approaches. Basically you put tiny springs and other forces between your data points and solve a physical system problem. Thats why you usually see graphs 'giggle' when plotting them. As you can guess, these kind of approaches depend heavily on the initialisation of points in space. That's why there is a 'random' component in usual technics.

0

Good evening,

What you have above can be called a rotation around the axis. What you need to note is the coordinates after the dimensionality reduction may not necessarily carry a meaning. What MDS does is it reshapes the data while maintaining distances (Euclidean distance in your case) between the observations.

To put it into context, take a look at the first two graphs and find Madrid and Dublin. You will see that their location on graph two is the opposite of graph one. But in the data you will see that the actual distance between the two points is the same.

Think about it this way: you are have two objects in a room: a chair and a table and they are 1 meter apart from each other. If you are standing close to the chair, to you the table is further away. On the other hand, if you are standing close to the table, to you the chair is further away. But the distance between them is always 1 meter.

So the coordinates system is a representation of a position of how you look at the data points in lower dimension while it preserves the Euclidean distance between the data points from the original higher dimensional dataset.

You can also try to scale it with PCA or t-SNE if you are interested to see how different dimensionality reduction techniques perform for this data.

Hope this helps!