Vizualisation of multidimensional data using SOM and MDS

A. Naud

W. Duch

Dates: 1995-1996 + 1997-

Result(s):

SOM allows to vizualize multidimensional datasets in a way that preserves data topology. One difference between the plots obtained by SOM and by MDS is that SOM performs a magnification of the clusters of points (If clusters exist in the data manifold, the intra-cluster distances are enlarged with respect to inter-cluster distances).

Nonmetric MDS minimizes Kruskal's S loss function whereas Sammon's mapping minimizes a function E which differs from S in that each squared difference of distances is divided by the corresponding output distance. Kruskal's nonmetric MDS is hence designed to preserve more accurately larger distances than Sammon's mapping does (The latter gives more weight to smaller distances).

Problems:

1) There exist many different methods for data vizualization (or dimensionality reduction); they differ mainly by the criterion (explicit or not) that they optimize.

- What is the more appropriate criterion for data vizualisation ?

- Should the criterion be dependent on the dataset ? In this case, is it possible to find some measures on the dataset that would allow to decide which method is best suited to vizualise a given dataset ?

2) An important practical difference between SOM and MDS (from the viewpoint of data vizualisation) is that SOM offers the possibility to visualize new points (that were not used during learning), whereas MDS and Sammon's mapping do not. (The Neural Network version of Sammon's mapping proposed by Mao and Jain in 1996 suppress this difference).

Working log: