Scalable way to calculate betweenness centrality for a graph in spark


I have a use-case to calculate betweenness centrality of nodes. I have tried graphx with spark-betweenness but it is a very long running job. Has anyone successfully calculated betweenness centrality of a large network with around 10 million vertices and 100 million edges?


Posted 2020-02-07T08:50:31.243

Reputation: 141

JGraphT can calculate BetweennessCentrality. However, I can't speak to whether or not it meets your performance needs. – dpdearing – 2020-03-18T21:55:06.260



Sorry, I do not think you can compute the exact betweenness centrality of nodes in a graph this size, as its complexity is $O(n\cdot m)$ where $n$ is the number of nodes, $m$ the number of links.

The good news is that you may approximate it, and in a way that may take benefit from parallel computations. Indeed, computing betweenness centrality relies on counting the number of shortest paths from any node to any other. You may (randomly) select some nodes and compute the numbers of shortest path from each of them to all others, and use the obtained number to approximate the betweenness. The more nodes you select, the best the approximate will be, but it is empirically rather good even with a small sample set.

Matthieu Latapy

Posted 2020-02-07T08:50:31.243

Reputation: 131