## kaggle Titanic what is GP?

2

sometimes I see the kernel with GP programming.
but without explanation, they put some random number with unknown equation
Here is some part of GP programming

gp <-function(data)
{
p<-0.200000*tanh(((((31.006277) * ((((((data$$Embarked) + (data$$Cabin))/2.0)) + (((((sin((tanh((data$$Parch))))) * (3.0))) - (data$$Pclass))))))) * 2.0)) +
0.200000*tanh(((31.006277) * (((((data$$Age) * (data$$Cabin))) + (((((0.318310) - (data$$Pclass))) + (pmin((2.0), (((data$$Parch) * 2.0)))))))))) +
0.200000*tanh(ifelse(rep(1.,NROW(data))*(ifelse(rep(1.,NROW(data))*(data$$SibSp>0.),data$$Cabin,sin((data$$Parch)))>0.),(7.90205097198486328),(((((((data$$Cabin) + (data$$Fare))/2.0)) - (9.869604))) - (31.006277)))) + 0.200000*tanh(((((((((((tanh(((((0.636620) < (data$$Parch))*1.)))) * 2.0)) - (data$$Pclass))) + (((data$$Embarked) + (sin((data\$Pclass))))))) * 2.0)) * 2.0)) +


full code is here https://www.kaggle.com/scirpus/my-first-gp-in-r

can someone explain what is this GP mean and how this number and equation come up with?

My guess is same as Neural network right?

1

GP is also known as genetic programming. It is an algorithm which has been inspired by natural selection (survival of the fittest) to find an ideal algorithm to perform some task. These algorithms create individuals at each generation which have an encoded behaviour as a set of genes. The individuals which best perform the task at each generation are selected to be permutated slightly and progress into the next generation. This causes the solution space in later generations to narrow around some local minimum.

# Example

Let's look at a simple example to see how this can be used for finding the minimum of a 2D function from -100 to 100.

This consists in 4 crucial steps: initialization, evaluation, selection and combination.

## Initialization

Each individual in the population is encoded by some genes. In our case the genes represent our $$[x, y]$$ values. We will then set our search range to [0, 1000] for this specific problem. Usually you will know what is naturally possible based on your problem. For example, you should know the range of possible soil densities in nature. We will create 100 individuals in our population.

## Evaluation of the fitness

This step simply asks you to put the $$[x,y]$$ values into your function and get its result. Pretty standard stuff.

## Selection

There are many ways with which you can select parents. I will always keep the alpha male. The best individual in the population, he will be cloned to the next. Then I will use tournament selection. We will repeat the following until the next generation population is full. Pick four parents at random, take the best individual from the first two and the best from the last two. These will be the two parents which will gives us our next offspring.

## Combination

From the two parents we will build the new genome for the child using the binary values of the $$[x,y]$$ values of the parents. The resulting binary value for each codon in the genome of the child is selected from the two parent genes by uniform random.

import numpy as np

class Genetic(object):

def __init__(self, f, pop_size = 1000, n_variables = 2):
self.f = f
self.minim = -100
self.maxim = 100
self.pop_size = pop_size
self.n_variables = n_variables
self.population = self.initializePopulation()
self.evaluatePopulation()

def initializePopulation(self):
return [np.random.randint(self.minim, self.maxim, size=(self.n_variables))
for i in range(self.pop_size)]

def evaluatePopulation(self):
return [self.f(i[0], i[1]) for i in self.population]

def nextGen(self):
results = self.evaluatePopulation()
children = [self.population[np.argmin(results)]]

while len(children) < self.pop_size:
# Tournament selection
randA, randB = np.random.randint(0, self.pop_size), \
np.random.randint(0, self.pop_size)
if results[randA] < results[randB]: p1 = self.population[randA]
else: p1 = self.population[randB]

randA, randB = np.random.randint(0, self.pop_size), \
np.random.randint(0, self.pop_size)
if results[randA] < results[randB]: p2 = self.population[randA]
else: p2 = self.population[randB]

signs = []
for i in zip(p1, p2):
if i[0] < 0 and i[1] < 0: signs.append(-1)
elif i[0] >= 0 and i[1] >= 0: signs.append(1)
else: signs.append(np.random.choice([-1,1]))

# Convert values to binary
p1 = [format(abs(i), '010b') for i in p1]
p2 = [format(abs(i), '010b') for i in p2]

# Recombination
child = []
for i, j in zip(p1, p2):
for k, l in zip(i, j):
if k == l: child.append(k)
else: child.append(str(np.random.randint(min(k, l),
max(k,l))))

child = ''.join(child)
g1 = child[0:len(child)//2]
g2 = child[len(child)//2:len(child)]
children.append(np.asarray([signs[0]*int(g1, 2),
signs[1]*int(g2, 2)]))
self.population = children

def run(self):
ix = 0
while ix < 1000:
ix += 1
self.nextGen()
return self.population[0]


To use this algorithm you can define some functions and then run the following.

f = lambda x, y: (x-7)**2 + y**2
gen = Genetic(f)
minim = gen.run()

print('Minimum found      X =', minim[0], ', Y =', minim[1])


1

'GP' stands for Genetic Programming. It is an area of artificial intelligence. A genetic program will generate a tree structure which consists of a string of numbers and functions. The tree is the solution which best fits the problem after some number of generations of the evolutionary process.

I suggest you read up on genetic programming. The founder of genetic programming is John Koza and his first book on genetic programming is entitled "Genetic Programming: On the Programming of Computers by Means of Natural Selection (Complex Adaptive Systems), 1992".

Scirpus wrote his own GP code in C++ and used it to generate the tree structure and pasted it into his R program. You can see how to do it yourself with a popular library like Distributed Evolutionary Algorithms in Python (DEAP).

is it similar as neural network which find unknown parameters? – slowmonk – 2019-02-11T09:13:23.167

Yes but the two algorithms find the parameters in entirely different ways. – Brian O'Donnell – 2019-02-16T17:52:03.593