Below is a simplified example of a h2o gradient boosting machine model using R's iris dataset. The model is trained to predict sepal length.
The example yields an r2 value of 0.93, which seems unrealistic. How can I assess if these are indeed realistic results or simply model overfitting?
library(datasets) library(h2o) # Get the iris dataset df <- iris # Convert to h2o df.hex <- as.h2o(df) # Initiate h2o h2o.init() # Train GBM model gbm_model <- h2o.gbm(x = 2:5, y = 1, df.hex, ntrees=100, max_depth=4, learn_rate=0.1) # Check Accuracy perf_gbm <- h2o.performance(gbm_model) rsq_gbm <- h2o.r2(perf_gbm)
> rsq_gbm  0.9312635