7

2

Assuming all of the following;

- I have 4 known numbers, all within a 0-400 range, like this:

```
Variable1 Variable2 Variable3 Variable4
0-400 0-400 0-400 0-400
```

I know that there is a mathematical relationship between the numbers.

I would like to use a genetic algorithm (computer code) to estimate/approximiate Variable2 and Variable3 based on Variable1 and Variable4.

Also, importantly, assume that there are many input samples and that each sample will differ slightly. Thus, a genetic algorithm optimization of "a mathematical formula/algorithm" to estimate/approximate Variable2 and Variable3 in all cases becomes possible.

(In other words, the genetic algorithm will be able optimize the mathematical formula towards the already-known Variable2 and Variable3 across many input samples, each with a similar though slightly different mathematical formula.)

How can I then write the following into a genetic algorithm:

```
Variable2=?
Variable3=?
```

Where `?`

could be any mathematical function (`+/-/*/:/√/^2/cos/sin/tan/etc.`

) involving Variable2 and Variable3

In other words; I would like the genetic algorithm to build a generic mathematical formula.

How can I define Variable2 and Variable3 as the outcome of a mathematical formula so that estimation by a computer algorithm becomes possible?

I am not sure how to approach this. The genetic algorithm software I use can use as many variables as is needed, and they can be in disparate ranges.

So for example, I could write my algorithm like this easily;

```
Variable2=Variable1(op)Variable4
Variable3=Variable1(op)Variable4
```

Where Variable1 is the first variable for the genetic algorithm, with a range of `0-400`

, and Variable4 is the second variable for the genetic algorithm, with a range of `0-400`

, and finally `(op)`

is the third variable for the genetic algorithm, for example with a range of `1-4`

where `1`

stands for `+`

, `2`

for `-`

, `3`

for `*`

, `4`

for `:`

etc.

However, the complexity of this algorithm is very limited and crude; it is not optimized towards a nice and complex real estimation algorithm. Also, as soon as a secondary operator is introduced, for example:

```
Variable2=[Variable1 or Variable4](op)[Variable1 or Variable4](op)[Variable1 or Variable4]
Variable3=[Variable1 or Variable4](op)[Variable1 or Variable4](op)[Variable1 or Variable4]
```

The coding complexity for this would start to increase quickly, and there may be a need to use `(`

and `)`

to prioritize mathematical calculations, etc. The coding complexity for even more complex calculations becomes seemingly unmanageable.

Is there a better and more straightforward way to let the genetic algorithm approximate/estimate Variable2 and Variable3 based on Variable1 and Variable4 into an overall optimized generic mathematical formula/algorithm?

Great answer! Thank you. – Roel Van de Paar – 2020-06-08T07:59:26.337

https://github.com/ShuhuaGao/geppy/blob/master/examples/sr/numerical_expression_inference-RNC.ipynb – Roel Van de Paar – 2020-06-08T07:59:43.750

To install: sudo -H pip3 install deap sympy graphviz && git clone --depth=1 https://github.com/ShuhuaGao

– Roel Van de Paar – 2020-06-08T07:59:47.2431@RoelVandePaar, thanks for the bounty. – Mehdi – 2020-06-08T21:13:20.213