I did some computation of formal derivatives a while back which might be of interest in this context (though keep in mind that this is anything but bullet proof! it does work for the cases I bothered to check though).

```
Clear[a]; Format[a[k_]] = Subscript[a, k]
```

Let us say we have an objective function which is formally a function of
the vector `a[i]`

```
Q = Sum[Log[Sum[a[r] Subscript[B, r][Subscript[x, i]], {r, 1, p}]/
Sum[a[r] , {r, 1, p}]], {i, 1, n}]
```

Let us define a couple of rules for formal differentiation as follows

```
Clear[d];
d[Log[x_], a[k_]] := 1/x d[x, a[k]]
d[Sum[x_, y__], a[k_]] := Sum[d[x, a[k]], y]
d[ a[k_] b_., a[k_]] := b /; FreeQ[b, a]
d[ a[q_] b_., a[k_]] := b Subscript[δ, k, q] /; FreeQ[b, a]
d[ c_ b_, a[k_]] := d[c, a[k]] b + d[b, a[k]] c
d[ b_ + c_, a[k_]] := d[c, a[k]] + d[b, a[k]]
d[Subscript[δ, r_, q_], a[k_]] := 0
d[x_, a[k_]] := 0 /; FreeQ[x, a]
d[G_^n_, a[k_]] := n G^(n - 1) d[G , a[k]] /; ! FreeQ[G, a]
d[Exp[G_], a[q_]] := Exp[G] d[G , a[q]] /; ! FreeQ[G, a]
Unprotect[Sum]; Attributes[Sum] = {ReadProtected};Protect[Sum];
```

And a rule to deal with Kroneckers

```
ds = {Sum[a_ + b_, {s_, 1, p_}] :> Sum[a, {s, 1, p}] + Sum[b, {s, 1, p}],
Sum[ y_ Subscript[δ, r_, s_], {s_, 1, p_}] :> (y /. s -> r),
Sum[ y_ Subscript[δ, s_, r_], {s_, 1, p_}] :> (y /. s -> r),
Sum[ Subscript[δ, s_, r_], {r_, 1, p_}] :> 1,
Sum[δ[i_, k_] δ[j_, k_] y_. , {k_, n_}] -> δ[i, j] (y /. k -> i),
Sum[a_ b_, {r_, 1, p_}] :> a Sum[b, {r, 1, p}] /; NumberQ[a],
Sum[a__, {r_, 1, p_}] :> Sum[Simplify[a], {r, 1, p}] }
```

Then, for instance, the gradient of `Q`

with respect to one of the `a[k]`

reads

```
grad = d[Q, a[k]] /. ds // Simplify;
```

Similarly the tensor of second derivatives w.r.t. `a[k]`

and `a[s]`

is given by

```
hess = d[d[Q, a[k]], a[s]] /. ds // Simplify
```

As a less trivial example let us consider the 4th order derivatives of `Q`

```
d[d[d[d[Q, a[k]], a[s]], a[m]], a[t]]; /. ds // Simplify
```

For the problem at hand we check easily that

```
Q = Sum[r a[r] , {r, 1, p}];
grad = d[Q, a[k]] // Simplify;
grad //. ds
```

returns `k`

as it should

**EDIT**

This process can be made a bit more general, say, on this Objective function

```
Q = 1/2 Sum[(Sum[a[r] Subscript[B, r, i][a[q]], {r, 1, p}] -
Subscript[y, i])^2, {i, 1, n}]
```

which depends non linearly on `a[k]`

via `B`

.

All we need is to add a new rule for `d`

```
d[H_[a[q_]],
a[k_]] := (D[H[x] , x] /. x -> a[k] ) Subscript[δ, k, q]
```

Now we readily have

```
grad = d[Q, a[k]] // Simplify;
hess = d[d[Q, a[k]], a[s]];
grad //. ds
```

```
hess /. ds // Simplify
```

As a other example, let us look at a parametrized entropy distance,

```
Q = -Sum[(Sum[a[r] Subscript[B, r, i], {r, 1, p}]/
Subscript[y, i]) Log[(Sum[a[r] Subscript[B, r, i], {r, 1, p}]/
Subscript[y, i])], {i, 1, n}]
```

we can compute its Hessian while mapping twice the sum rule

```
Map[# /. ds &, d[d[Q, a[k]], a[s]] /. ds]
```

As a final example, consider a Poisson likelihood

```
Q = Sum[Log[Exp[-a[k]] a[k]^Subscript[y, k]/Subscript[y, k]!], {k, 1, n}]
```

so that

```
grad = d[Q, a[k]] // Simplify
```

and

```
hess =d[d[Q, a[k]], a[s]] /. ds // Simplify
```

Of course these algebraic rules are not bullet proof but illustrate nicely the way mathematica handles new grammar.

@m_goldberg Sorry for the newbie question, but how exactly is

`D[sum[n], x[2]]`

the same thing as`D[Sum[n, y], x[2]]`

? In particular,`Sum[n,y] = ny`

since it sums n y times but`sum[n] =`

$\sum^{n}_{i=1} i x[i]$ (you can check it on a notebook on mathematica). – Pinocchio – 2015-08-07T19:48:53.750Evaluating

`D[sum[n], x[2]]`

is essentially the same as evaluating`D[Sum[n, y], x[2]]`

, where`y`

is value-free, which is nothing`D`

recognizes as depending on`x[2]`

. – m_goldberg – 2012-12-15T16:10:56.9402

This question is closely related to and might even be considered a duplicate of this one

– m_goldberg – 2012-12-15T16:25:32.8001Extending Mathematica to allow sums of symbolic length really would require extended it to allow lists of symbolic length; only then could one properly worry about differentiating such objects. But how to make a robust, general design for such an extension is hardly obvious and raises all sorts of issues, e.g.: Should

`Infinity`

to an acceptable value for`n`

? What would the effect be upon processing speed for specific-length objects? – murray – 2012-12-15T17:41:56.6401I've changed your title to something more specific. please change it to something else if you think it's better – acl – 2012-12-15T19:39:57.960

5For formal differentiation, what I've noticed is if it's in a tuxedo it's probably a guy, and if it's in a gown it's usually a gal. But this is only a rough guideline, and anyway times have changed. – Daniel Lichtblau – 2012-12-15T22:15:22.260

This question is not related to generating a given amount of variables to use later on. The calculation should be without the need to specify a specific amount. The only prior should be that the number of variables is limited. – Wizard – 2012-12-22T17:01:57.537