2

2

An undirected graph represents a database where nodes of the graph represent tables, edges represent the joiner columns. There are 100 databases( it means, 100 undirected graphs). We have to build a clustering model which can cluster these graphs into four groups based on attributes of nodes( that is the list of column names), attributes of edges ( that is the joiner column) and overall structure of the graph database.

How can we build the clustering models ?

What is the goal of the clustering? Do you have a distance function the suits your goal? I suggest using https://en.wikipedia.org/wiki/DBSCAN It is a connectivity based clustering algorithm. Since tables related by key is data that have connect components dbsacn should be suitable.

– DaL – 2016-07-11T13:02:39.060Goal is to be able to group similar databases(graphs). Assume that there would be four different types of databases and we have to group them logically so as to identify the pattern of each group. – user3142384 – 2016-07-12T03:16:20.127

What is your definition of similarity among databases? – DaL – 2016-07-12T06:10:52.780

Based on : Number of tables(nodes), its joining column(edge) and overall structure – user3142384 – 2016-07-12T06:26:25.380