1

## Problem

I have the following directed tripartite graph $G(E\cup V\cup P, A)$, where there is a many-to-one symmetric relationship between the subsets V and E - $e\in E,v\in V,[e, v]\in A \iff [v, e]\in A$ - and a many-to-many relationship between the subsets P and V. All edges $[x, y]\in A$ have a weight $w_{xy}$ which determines the portion of score that will be propagated from node x to y.
$$\forall_{x\in G}\sum_{y\in G}w_{xy}=1\\\
[x, y] \notin A \iff w_{xy}=0
$$
I got the following naive equations, where $SP_t(p)$ is the Score of "p" at iteration "t" (same for SV and SE), $SP_0(p)$ is the starting score of "p":
$$
SP_{t+1}(p)=SP_0(p)+\sum_{v\rightarrow p}SV_t(v)w_{vp},\\\
SV_{t+1}(v)=SV_0(v)+\sum_{p\rightarrow v}SP_t(p)w_{pv}+\sum_{e\rightarrow v}SE_t(e)w_{ev},\\\
SE_{t+1}(e)=SE_0(e)+\sum_{v\rightarrow e}SV_t(v)w_{ve}.\\\
(1)
$$
**I want to compute the scores of each node in a way that I can rank those nodes within their subset. If a node's neighbors are high ranked, then this node is also high ranked**.

**1)** How can I prove that iteration will converge to $SP_t(p)-SP_{t+1}(p)\leqslant\epsilon$ (same for SV and SE) for a given threshold $\epsilon\in R^+$ in a viable time (that is, at max in linear time complexity)?

### Generalization

I think what I am trying to do here is resolve a kind of random walk over G, the scores are random variables since a node's score depends on other nodes pointing to it. This problem can be modeled as a Markov Chain, I need to find the fraction of time the random walker spends at each node in G (it is the normalized score).

**2)** Let "s" be the vector of score of all nodes in $G$ and $W^{|G|\times |G|}$ the transition matrix of the markov chain, I want to find a "s" such that $sW=s$. It is equivalent to find the convergence of system (1) when $\epsilon =0$. **Am I right?** If so, how can I prove that "s" exists and can be found in a viable time?

### Related algorithms

Searching I found the algorithms PageRank and Generalized Co-HITS that solves similar problems, PageRank was designed for unipartite graphs and Generalized Co-HITS designed for bipartite graphs.

The PageRank(PR) of a page "p" is given by $PR_{t+1}(p)=(1-a)\frac 1n+a\sum_{u\rightarrow p}PR_t(u)\frac 1{d_u^+}$ - where "p" and "u" are nodes, "n" is the # of nodes, "a" is the "damping factor", $d_u^+$ is the outdegree of "u".

**3)** I see that I could use it but I am not sure if it will compute an accurate score to rank nodes within their subsets, because PR will rank all nodes within the superset P + V + E. **Am I right?**

Generalized Co-HITS looks like PR. Consider just two subsets P and V which makes a bipartite graph:
$$
SP_{t+1}(p)=(1-a)SP_0(p)+a\sum_{v\in V}SV_t(i)w_{vp},\\\
SV_{t+1}(v)=(1-b)SV_0(v)+b\sum_{p\in P}SP_t(p)w_{pv}.\\\
(2)
$$
I tried to extend it to a tripartite graph based on system (1). I substituted SV in SP and SE:
$$
SP_{t+1}(i)=SP_0(i)a+(1-a)b\sum_{j\in V}W_{ji}^{vp}SV_0(j)+(1-a)(1-b)\left[\sum_{m\in P}\left(\sum_{j\in V}W_{mj}^{pv}W_{ji}^{vp}\right)SP_t(m)+\sum_{n\in E}\left(\sum_{j\in V}W_{nj}^{ev}W_{ji}^{vp}\right)SE_t(n)\right],\\\
SV_{t+1}(j)=SV_0(j)b+(1-b)\left(\sum_{m\in P}W_{mj}^{pv}SP_t(m)+\sum_{n\in E}W_{nj}^{ev}SE_t(n)\right),\\\
SE_{t+1}(k)=SE_0(k)c+(1-c)b\sum_{j\in V}W_{jk}^{ve}SV_0(j)+(1-c)(1-b)\left[\sum_{m\in P}\left(\sum_{j\in V}W_{mj}^{pv}W_{jk}^{ve}\right)SP_t(m)+SE_t(k)\sum_{j\in V}W_{kj}^{ev}W_{jk}^{ve}\right].\\\
(3)
$$
and got a weird system. Here I swapped "a", "b", and "c" (all acts like damping factors) positions, now they multiply the starting score. $W_{}^{pv}$ is the weight matrix from P to V. But I changed the end of 3rd equation:
$$
\sum_{n\in E}\left(\sum_{j\in V}W_{nj}^{ev}W_{jk}^{ve}\right)SE_t(n) \equiv SE_t(k)\sum_{j\in V}W_{kj}^{ev}W_{jk}^{ve}
$$
because of the many-to-one symmetric relationship between V and E, there is no path $e_1\rightarrow v_1\rightarrow e_2$ (but it may happen in paths like $e_1\rightarrow v_1\rightarrow p_1\rightarrow v_2\rightarrow e_2$), I have doubts if it keeps all properties of system (2) still valid.

**4)** Is system (3) true, the properties of system (2) are still valid in system (3)?

**5)** What is the goal of that "damping factor"? I think it is just there to weight the contribution coming from propagation and the contribution of starting score to the final score. So a = 0.15 means 15% of final score came from starting score and 85% came from propagation. Thinking better, it is the union of two disjoint events, event 1 with 0.15 probability and event 2 with 0.85, where event 1 is the random walker arriving at a node coming by a meta-edge and event 2 is it coming by this node's neighbors. PageRank's authors calls that meta-edge a jump or teleportation, a web surfer may start browsing from any webpage. **Am I right?** and why other similar algorithms still use that damping factors? PageRank was modeled for the Web.

Should I split those questions in several posts?
Maybe I am missing something obvious, I am still learning. Thanks in advance.