3

I took the data from here and wanted to play around with multidimensional scaling with this data. The data looks like this:

In particular, I want to plot the cities in a 2D space, and see how much it matches their real locations in a geographic map from just the information about how far they are from each other, without any explicit latitude and longitude information. This is my code:

```
import pandas as pd
import numpy as np
from sklearn import manifold
import matplotlib.pyplot as plt
data = pd.read_csv("european_city_distances.csv", index_col='Cities')
mds = manifold.MDS(n_components=2, dissimilarity="precomputed", random_state=6)
results = mds.fit(data.values)
cities = data.columns
coords = results.embedding_
fig = plt.figure(figsize=(12,10))
plt.subplots_adjust(bottom = 0.1)
plt.scatter(coords[:, 0], coords[:, 1])
for label, x, y in zip(cities, coords[:, 0], coords[:, 1]):
plt.annotate(
label,
xy = (x, y),
xytext = (-20, 20),
textcoords = 'offset points'
)
plt.show()
```

Most of the cities seem to be around the correct general location relative to each other, except a few infractions - Dublin is too far away from London, Istanbul is in the wrong location, etc. However, **if I give a different random_state value, it produces a different "map"**. For example,

`random_state=1`

produces the following map, where many of the cities do not seem to be around the correct general location relative to other cities:What I don't understand is, dimensionality reduction methods are not supposed to have randomness associated with them, and thus should not give different results for different seeds. But it does here; so what does it mean?

The documentation of the `sklearn.manifold.MDS`

function states that `random_state`

is "the generator used to initialize the centers". So, in particular, I guess what I'm asking is, whatever initialization of the centres we choose, shouldn't all of them lead to one unique result?

I get a much more "accurate" map (to my eyes at least) by giving the following hyperparameter values:

```
mds = manifold.MDS(n_components=2, dissimilarity="euclidean", n_init=100, max_iter=1000, random_state=1)
```

1Distances are Euclidean and you should use this choice as a hyperparameter for dissimilarity. – Regi Mathew – 2020-01-02T05:24:39.450