## How decision trees work in Python

1

1

I am new to the field of machine learning. I have just recently learnt Decision Trees and started solving Titanic Survival problem from Kaggle Competition. I understood the algorithm behind decision trees and how it actually works but I can't really match it with how python functions work this way. Say what exactly fit method does?

Well correct me if my concept is wrong. I think that looking at the train set i can build a model on "if a passenger survived or not" . Then I split the model into train set and test set and then the fit method (x_train, x_test) (this is what i think it works) actually tries to find the connection between x_train and y_train. I BUILT using my Train dataset and then tries to predict my test set. Say if i change my model, the fit method will try again to find the connection between my x_train and y_train using my new model. (Sorry for some terms as I am a self-learner. I don't really have the grasp of machine learning terms)

It will be of great help if you could suggest some simple data science problems which can be solved using Decision trees(besides Iris dataset)

The fit method is used only on X_train, y_train. It trains your model, and is the learning part. The predict method is where you try to apply what the model has learned on data the model has not seen before. This is the usual scikit-learn API, at least. – Adrian Keister – 2018-09-11T13:03:57.730

Can you explain if the fit method has any connection between training my model and my conditional approach? – Sadil Khan – 2018-09-11T13:05:42.627

Not sure what you mean by "conditional approach". What I've said generally applies to a gigantic array of models (not just decision trees) in scikit-learn. – Adrian Keister – 2018-09-11T13:07:06.103

Okay... But i can't just load a dataset and then try to split and fit... So i define a function f that checks if any passenger is Female then she survived, else died.. So my question is when i apply fit method will it consider the conditions mentioned in f – Sadil Khan – 2018-09-11T13:11:07.863

If your target variable is survived, the hope is certainly that your model will accurately predict if the passenger survived or not. You probably want your model to work on the entire Titanic dataset, and not just female passengers, right? You could restrict yourself to just the female passengers if you wanted. In any case, I would recommend that the training data and the test data be of the same kind. You're either training/testing on the full dataset, or you're training/testing on the female-only data, etc. – Adrian Keister – 2018-09-11T13:13:54.743