2

I'm trying to use the Graphical Lasso algorithm (more specifically the R package glasso) to find an estimated graph representing the connections between a set of nodes by estimating a precision matrix. I have a feature matrix containing the values of multiple features for each of the nodes, and the sample covariance matrix obtained from the product between this matrix and its tranpose is used as the input for the glasso function, along with the l1 regularization coefficient $\lambda$.

However, when testing this with some simple examples I'm not getting the expected results. For example, when using the following feature matrix (5 nodes with 2 features each) as input:

$D=\begin{bmatrix} 1 & 10 \\ 2 & 10 \\ 3 & 10 \\ 4 & 10 \\ 5 & 10 \end{bmatrix},$

with $\lambda=0.01$, I get (approximately) the following inverse covariance matrix as output:

$P=\begin{bmatrix} 19.8 & -18 & -6.3 & 0 & 2.2 \\ -18 & 34.5 & -12.1 & -4.5 & 0 \\ -6.3 & -12.1 & 36.1 & -12.8 & -3.8 \\ 0 & -4.5 & -12.8 & 34.8 & -15.9 \\ 2.2 & 0 & -3.8 & -15.9 & 20.7 \end{bmatrix}.$

As far as I understand, a value of 0 in this precision matrix indicates that the two corresponding nodes are conditionally independent. As such, it makes sense for entries $p_{1,3}$ and $p_{2,4}$ (along with their symmetrical equivalents) to be zero. However, I don't understand why entry $p_{1,4}$ has a greater value than entry $p_{1,3}$, for example, considering that the feature values in the first node are closer to those of the third node than the fourth. Moreover, I would like have more entries with a value of zero, which means I want a sparser matrix. Therefore, I tried increasing the value of $\lambda$ to 0.03, which results in the following precision matrix:

$P=\begin{bmatrix} 8.3 & -5.8 & -3.2 & -0.6 & 0 \\ -5.8 & 11.9 & -3.6 & -2.5 & -0.1 \\ -3.2 & -3.6 & 13.2 & -3.3 & -2.4 \\ -0.6 & -2.5 & -3.3 & 12.6 & -4.7 \\ 0 & -0.1 & -2.4 & -4.7 & 9.6 \end{bmatrix}.$

Now there are less entries with a value of zero (so the sparsity decreased rather than increasing), and while there was a general decrease in the value of every entry, the larger entries decreased at a higher rate, leading to a more evenly distributed matrix. This is not consistent with the feature selection I'm used to seeing in standard lasso regularization, and looks more like some sort of l2 norm regularization.

I feel like there must be something fundamental that I'm missing completely here. Is this method not supposed to be applied in this way?

The code I'm using:

```
D=matrix(c(1, 10, 2, 10, 3, 10, 4, 10, 5 , 10),nrow=5,ncol=2, byrow=TRUE)
m=dim(D)[2]
D = D - mean(D)
covar = (1/m)*D%*%t(D)
D = D/sqrt(max(covar))
covar = (1/m)*D%*%t(D)
a=glasso(s=covar,rho=0.01)
```