Which two accuracies I compare to see if the model is overfitting or not?
You should compare the training and test accuracies to identify over-fitting. A training accuracy that is subjectively far higher than test accuracy indicates over-fitting.
Here, "accuracy" is used in a broad sense, it can be replaced with F1, AUC, error (increase becomes decrease, higher becomes lower), etc.
I suggest "Bias and Variance" and "Learning curves" parts of "Machine Learning Yearning - Andrew Ng". It presents plots and interpretations for all the cases with a clear narration.
When I get to run 10 fold cross-validation, I get 10 accuracies that I
can take the average/mean of. should I call this mean as validation
No. It is a [estimate of] test accuracy.
The difference between validation and test sets (and their corresponding accuracies) is that validation set is used to build/select a better model, meaning it affects the final model. However, since 10-fold CV always tests an already-built model on its 10% held-out, and it is not used here to select between models, its 10% held-out is a test set not a validation set.
Afterward, I test the model on 30% test data and get Test Accuracy.
If you don't use the K-fold to select between multiple models, this part is not needed, run K-fold on 100% of data to get the test accuracy. Otherwise, you should keep this test set, since the result of K-fold would be a validation accuracy.
In this case, what will be training accuracy?
From each of 10 folds you can get a test accuracy on 10% of data, and a training accuracy on 90% of data. In python, method
cross_val_score only calculates the test accuracies. Here is how to calculate both:
from sklearn import model_selection
from sklearn import datasets
from sklearn import svm
iris = datasets.load_iris()
clf = svm.SVC(kernel='linear', C=1)
scores = model_selection.cross_validate(clf, iris.data, iris.target, cv=5, return_train_score=True)
return_estimator = True to get the trained models too.
More on validation set
Validation set shows up in two general cases: (1) building a model, and (2) selecting between multiple models,
Two examples for building a model: we (a) stop training a neural network, or (b) stop pruning a decision tree when accuracy of model on validation set starts to decrease. Then, we test the final model on a held-out set, to get the test accuracy.
Two examples for selecting between multiple models:
a. We do K-fold CV on one neural network with 3 layers, and one with 5 layers (to get K models for each), then we select the NN with the highest validation accuracy averaged over K models; suppose the 5 layer NN. Finally, we train the 5 layer NN on a 80% train, 20% validation split of combined K folds, and then test it on a held out set to get the test accuracy.
b. We apply two already-built SVM and decision tree models on a validation set, then we select the one with the highest validation accuracy. Finally, we test the selected model on a held-out set to get the test accuracy.