I am working through the Titanic competition. This is my code so far:
import pandas as pd from sklearn.linear_model import LinearRegression from sklearn.model_selection import train_test_split train = pd.read_csv("https://raw.githubusercontent.com/oo92/Titanic-Kaggle/master/train.csv") test = pd.read_csv("https://raw.githubusercontent.com/oo92/Titanic-Kaggle/master/test.csv") train['Sex'].replace(['female', 'male'], [0, 1]) train['Embarked'].replace(['C', 'Q', 'S'], [1, 2, 3]) # Fill missing values in Age feature with each sex’s median value of Age train['Age'].fillna(train.groupby('Sex')['Age'].transform("median"), inplace=True) linReg = LinearRegression() data = train[['Pclass', 'Sex', 'Parch', 'Fare', 'Age']] # implement train_test_split x_train, x_test, y_train, y_test = train_test_split(data, train['Survived'], test_size=0.2, random_state=0) # Training the machine learning algorithm linReg.fit(x_train, y_train) # Checking the accuracy score of the model accuracy = linReg.score(x_test, y_test) print(accuracy*100, '%')
This line previously looked like this:
data = train[['Pclass', 'Parch', 'Fare', 'Age']], which ended up giving me an accuracy score of
19.5%. I realized that I didn't include sex so I went ahead and did this:
data = train[['Pclass', 'Sex', 'Parch', 'Fare', 'Age']]
Then, I got the following error:
ValueError: could not convert string to float: 'female'
Here I realized that the changes that I've done to my
train['Age'] did not reflect on the training and the testing of the model, which seems to be the reason why my model performed at
19.5%. How do I come across this problem?