Computation of a column-stochastic matrix with target row sums


I want to generate an $N\times N$ matrix $A$ so as to target an $N$ vector of row sums and simultaneously all column sums should sum to 1. In addition to this, I have a prefixed number of elements which are set to zero. For example, beginning with: $$ \left[\begin{array}{rrr} 0 & 1 & 1 \\ 1 & 0 & 0 \\ 1 & 1 & 0 \end{array}\right] $$ and the row sum vector $[1.5, 0.25, 1]^{T}$, I want to end up with $$ \left[\begin{array}{rrr} 0 & a_{12} & a_{13} \\ a_{21} & 0 & 0 \\ a_{31} & a_{32} & 0 \end{array}\right] $$ under the following conditions:

$a_{12} + a_{13} = 1.5$

$ a_{21} = 0.25$

$a_{31}+a_{32} = 1$

$a_{21}+a_{31} = 1$

$a_{12}+a_{32} = 1$

$a_{13} = 1$

While this is simplistic, in general, I have $2N$ equations in $N^{2}-Z$ unknowns, where $Z$ is the number of elements fixed to zero. So, this system of equations could be overdetermined or underdetermined, but I would like to be able to generate matrices like this such that all nonzero elements lie in $(0,1]$.

Piyush Panigrahi

Posted 2015-03-04T14:34:51.013

Reputation: 21



This problem, being more of a combinatorial rather than statistical nature, can be though of as a max-flow problem in a bipartite network. Each column of your matrix corresponds to a "source node" in the left part of the bipartite graph $BP(L, R)$; each row of the matrix corresponds to a "destination node" in the right part $R$. Capacity of each source node is the corresponding column's sum; similarly, capacities are defined for the destinations via row sums. Value $a_{ij}$ in the matrix describes flow of mass from $j$'th source to $i$'th destination along an existing edge $(j \to i)$ of $BP$. Your initially given 0-1 matrix describes which edges are present (e.g., the matrix in your example prescribes the edge $(3, 2)$ to be absent from the network, along with all self-loops). If you denote $F$ to be the sum of all column sums (or, alternatively, the total capacity of all sources), then your problem narrows down to finding a max-flow from $L$ to $R$ in $BP$ and checking whether the volume of the discovered flow equals $F$.

Note 1 (Feasibility): Before plunging into finding best flows, you may want to check your row sums: a necessary condition for your problem to have a solution is that the sum of all column sums (which is $N$ in your particular case of a column-stochastic matrix) should be equal the sum of row sums, as both numbers describe the same -- the sum of all elements of the target matrix. (For example, for $N = 3$ and the vector of row sums from your example, the problem is unfeasible.)

Note 2 (Upper bound): Since $a_{ij}$ describes the flow from $j$'th source, and the capacities of all sources are 1 in your setting, then each source cannot send more than 1 unit of mass to a single destination. Thus, each $a_{ij}$ will not exceed 1.

Note 3 (Lower bound): To make $a_{ij}$ for each existing edge $(j \to i)$ strictly positive, you can augment the max-flow problem with lower bounds on edge capacities. In a general setting, edge capacity bounds are $0$ and $+\infty$. You may want to replace the lower bound with some positive number, small enough not to interfere with the discovery of the best flow and large enough to satisfy your edge saturation needs.


Posted 2015-03-04T14:34:51.013

Reputation: 288