Finding the energy function given update rule of a single layer non-linear neural network


Consider the network with N neurons, each of which takes a $2 \times k$ input specified by the tuple $(\vec c_t, \vec \theta_t)$ to produce output $\vec{R}_t$ through an update rule on the pairwise weights between the neurons $\mathbf{W_t}$:

\begin{align} g(x) &= \lfloor x \rfloor^2, \forall x \in R,\\ \vec r_t &= g\left(\begin{bmatrix}\vec c_t & b\end{bmatrix}^\intercal \begin{bmatrix}\mathbf{f}(\vec\theta_t) \\ \mathbf{1}\end{bmatrix}\right),\\ \vec R_t &= \vec r_t^\intercal [\mathrm{diag}(\sigma^2\mathbf{1} + \mathbf{W_t} \vec r_t/n)]^{-1},\\ \Delta \mathbf{W_{t+1}} &= \mathbf{W_{t+1}} - \mathbf{W_{t}} = \alpha (\vec R_t^\intercal \vec R_t - \mathbf{C}) + \beta (\mathbf{W_{t}} - \mathbf{C}). \end{align}

where $\vec c \in R^k, \vec \theta \in R^k, \vec r \in R^n, \alpha, \beta, b \in [0,1]$, nonlinear activation $\mathbf{f} \in [c_0, 1]^{nk}$, convex map $g$ truncates all negative elements to 0 then squares each element of a vector, $\mathrm{diag}$ converts its input vector to a diagonal matrix. At $t=0$, network is initialized using inputs $(\vec c_0, \vec \theta_0)$ as $\mathbf{W_0}=\mathbf{1}, C:=\vec R_0^\intercal \vec R_0$.

  • Premise question on existence of stable state(s): What initial conditions of $b, \alpha, \beta, \sigma$ and inputs $(\vec c_t, \vec\theta_t)$ lead to stable states? That is, when does Eq.4 lead to no update at $t\to+\infty$?
  • Main question: Suppose the update rule Eq. 4 is applying some "backprop" trying to "drive" some network state $E$ to an "optimal" state $E^*$ computed by a cost function $E(R, ...)$ from the network outputs $R$ and the network state $W, C, b, \sigma, \alpha, \beta$ [1]. Can we write $E(R, ...)$ explicitly in terms of the output $R$ (and the initial conditions)?

My attempts

The first question requires solving $\vec R_t^\intercal \vec R_t = \mathbf{C},\mathbf{W_{t}} = \mathbf{C}$. Plugging in Eq.3 doesn't seem to simplify much. I tried taking svd of $\vec R_t^\intercal \vec R_t$ and $\mathbf{W_{t}}$ etc., but didn't lead to much progress. I am not sure how to solve for the equilibrium states of a dynamic system specified by matrices like this.

For the second question, suppose Eq.4 describes traveling along the steepest gradient in the state space $E(R, ...)$. Then it should correspond to the maximizing the differential $\left|\frac{\partial E_t}{\partial\mathbf{W_t}}\right|$, i.e. $\mathrm{argmax}_{\mathcal{E}}\left|\frac{\partial E_t}{\partial\mathbf{W_t}}\right|=\alpha (\vec R_t^\intercal \vec R_t - \mathbf{C}) + \beta (\mathbf{W_{t}} - \mathbf{C})$.

[1]: Inspired by


Posted 2020-08-01T22:07:21.563

Reputation: 101

No answers