5

2

Let's say that I have a directed graph, and for each node I want to have some measure of whether it is more of an "upstream" node (that lies at the beginning of many paths), or a "downstream" node (to which many paths converge).

Say, in a sequence A->B->C I want A to have the highest value, and C the lowest value, while B would be "in between". In a perfect cycle A->B->C->A I want all nodes to have the same "in between" value. The metrics should tolerate a mix of cycles and acyclic elements.

Ideally, it would also give more extreme values to nodes from longer acyclic paths: higher values to nodes with larger estuaries (the 'mother of all' node), and lower values to nodes with larger watersheds (the 'all roads lead to Rome' node that integrates all flows).

And also ideally it should generalize to weighted graphs. And, ideally, in a way that does not depend on the scaling factor for weights (so if you multiply all weights in the graph by 2, the metrics for each node shouldn't change, as the topology clearly didn't change with this scaling).

For a non-weighted graph I can come with a metrics like that: find all predecessors of a node, find all children of a node, then divide the number of children to the sum of children and ancestors; this value would be 1 for any point of origin; would give 0 for dead-ends, and would give 0.5 for cycles. So it's mostly OK. What I don't like about this metrics is that 1) it does not care about the length of the path, 2) it will be computationally slow, 3) I don't know how to generalize it to weighted graphs in a scale-invariant way.

So I was wondering whether there's a known metrics with approximately these properties that was described and studied before. It feels like a logical thing to calculate that many people would use when analyzing social networks, for example; so it feels like it should have a name and published algorithms. Thank you!

Edit: I think it's fair to say that the pagerank metrics has many of the properties I described (with the values reversed): sinks are high, sinks with larger watersheds are higher, nodes of origin are low, cycles tend to have "in-between" values, and the algorithm clearly supports weighted graphs. The part it does not care about is whether a node of origin has a large estuary or no estuary at all. Now I'm wondering whether I actually need 2 metrics: one page-rank, for watersheds, and a different one for estuaries. Like a weighted share of nodes visited by random walks initiated in the node of interest, or something like that. Or are there simpler metrics?

1How about the mean distance? – Emre – 2017-08-24T19:37:09.780

Mean distance to all other elements of the graph that can be reached from this point? It qualifies in some ways: is max for origin, is 0 for dead-end, is in between for acyclic in-between, may work for weighted graphs. However I'm not sure how to make it work for cycles (wouldn't it go to infinity?), and also it's unpleasantly asymmetrical: I can calculate mean distance FROM the point, or TO the point, but how to incorporate both? – ampanmdagaba – 2017-08-24T19:50:47.427

If I understand your network flow analogy, you could modify PageRank to sum the $L^p$ norm of the inbound links' PageRanks for $0 \leq p \lt 1$ (instead summing the PageRanks without the norm). This would prevent large PageRanked nodes from swamping the contribution of smaller ones. Then you would probably lose some of the nice computational properties but that's something you can think about separately. – Emre – 2017-08-26T17:33:56.400