200 Micro Average vs Macro average Performance in a Multiclass classification setting 2016-12-29T17:39:07.967

30 When is precision more important over recall? 2018-04-26T14:31:01.753

30 What is the difference between bootstrapping and cross-validation? 2018-05-28T13:16:29.283

15 How many features to sample using Random Forests 2017-10-10T10:50:22.720

15 Macro- or micro-average for imbalanced class problems 2018-08-13T09:57:37.077

14 How to define a custom performance metric in Keras? 2016-08-30T08:52:40.127

14 Train/Test Split after perform SMOTE 2016-12-09T00:19:45.343

12 Neural Networks - Loss and Accuracy correlation 2016-08-25T13:20:18.243

10 Why is the F-measure preferred for classification tasks? 2018-08-12T09:32:02.683

9 Difference between using RMSE and nDCG to evaluate Recommender Systems 2014-06-14T18:53:32.243

9 How do you evaluate ML model already deployed in production? 2016-12-06T00:00:03.877

8 Do I need validation data if my train and test accuracy/loss is consistent? 2020-06-16T01:18:56.477

7 Irregular Precision-Recall Curve 2017-11-21T18:44:09.630

7 Evaluating machine learning explainers? 2019-07-19T08:34:35.313

6 When do I have to use aucPR instead of auROC? (and vice versa) 2015-11-24T11:50:46.290

6 Chi-square as evaluation metrics for nonlinear machine learning regression models 2018-08-06T18:08:54.187

6 How to compare two unsupervised anomaly detection algorithms on the same data-set? 2019-03-20T09:52:00.290

6 Is the F1 Score sensitive to the threshold? 2019-05-16T06:03:30.657

5 Hyperparameter tuning in multiclass classification problem: which scoring metric? 2018-04-26T12:57:36.527

5 Does a precision score increasing with a higher number of folds mean the model will improve with more data? 2019-02-13T18:02:00.433

5 Evaluating the performance of a machine learned recommendation system 2019-12-06T22:27:22.837

5 Finding out why your model is doing better? 2020-07-09T17:53:23.807

4 Why are there currently no content-based evaluation metrics for information retrieval? 2015-11-16T15:14:37.750

4 Assessing significance / confidence of a crossvalidated performance measure 2016-01-28T13:13:52.680

4 How can conclusions be drawn from recommendation systems evaluation? 2016-09-22T15:33:14.370

4 In XGBoost, how to change eval function and keeping same objective? 2017-05-17T13:36:53.513

4 python xgboost DMatrix - get feature values or convert to np.array 2017-07-11T10:55:04.823

4 How can RL agents be monitored? 2017-12-06T08:58:46.353

4 Evaluating new features 2018-01-08T16:28:12.343

4 Class leaking on validation set 2018-04-11T13:32:31.317

4 Scikit-learn average_precision_score() vs. auc score of precision_recall_curve() 2018-08-19T13:57:01.003

4 Micro-F1 and Macro-F1 are equal in binary classification and I don't know why 2019-02-21T18:39:20.253

4 Smart data split (train/eval) for Object Detection 2019-06-25T11:44:43.277

4 Difference between learning_curve and validation_curve 2019-10-28T07:47:14.877

4 How is "relevance" defined in information retrieval outside the context of systems with user feedback? 2019-12-07T15:43:47.513

3 How to represent ROC curve when using Cross-Validation 2016-10-06T10:03:15.077

3 roc_auc score GridSearch 2016-12-01T19:41:30.380

3 Is Gini coefficient a good metric for measuring predictive model performance on highly imbalanced data 2017-06-15T20:15:12.750

3 Evaluating Logistic Regression Model in Tensorflow 2017-06-20T13:03:28.923

3 What is the efficiency difference between different cost functions in case of neural networks? 2017-08-25T11:45:29.727

3 Evaluation methods for multi-class classification 2018-05-05T08:39:13.257

3 Evaluating the result of topic modeling in a way that time matters 2018-07-26T04:05:36.897

3 Evaluation of regression models with different evaluations (MSE, variance, VAF etc.) 2018-07-30T08:19:15.210

3 Statistical test for machine learning 2018-09-18T09:06:17.933

3 In k-fold-cross-validation, why do we compute the mean of the metric of each fold 2019-06-14T08:42:47.667

3 Learning curve using micro F-score and macro F-score 2019-09-01T23:09:54.880

3 How best to show the best model over multiple labels? 2019-12-14T17:17:29.660

3 evaluation metrics for multiple values per session 2019-12-30T06:34:07.003

3 Confusion about the MSE ERROR 2020-01-25T20:37:52.603

3 Is it correct to define the F-measure as the harmonic mean of specificity and sensitivity in such a way? 2020-03-02T05:17:30.427

3 Appropriate objective function and evaluation metric when I DO care about outliers? 2020-07-06T16:14:48.887

3 Evaluation metric for Information retrieval system 2020-12-07T12:12:47.010

2 Correlation as an evaluation metric for regression 2016-01-23T08:04:13.130

2 Is there any PageRank-like method on weighted graph? 2016-08-04T02:26:25.090

2 XGBoost increase the error when changing evaluation function 2016-08-19T14:53:27.307

2 Using tensorflow to test a variable amount of correct labels 2016-09-24T13:16:15.190

2 how to evaluate top n recommendation system with movie lens dataset? 2016-10-02T13:13:15.027

2 Comparing Non-deterministic Binary Classifiers 2016-12-12T18:44:09.567

2 How to compare LDA and TF-IDF? 2017-06-14T07:05:19.683

2 Why exactly using a test set for model evaluation is a bad idea? 2017-09-25T21:43:15.577

2 How to evaluate sequence to sequence models? 2017-12-11T09:33:50.547

2 Splitting hold-out sample and training sample only once? 2017-12-19T15:06:57.890

2 Do I need to use Bayes to combine a sample's class probability with the performance of the overall model? 2017-12-22T18:46:45.913

2 Find threshold in rate to determine reason for lost customer 2018-02-07T09:19:44.687

2 Calculate average Intersection over Union 2018-05-07T09:08:23.440

2 Ranking ATM based on Utilization and Economic Data (Scoring/Rank Model) 2019-02-07T09:46:19.737

2 Obtain learning curve of Gradient Boosted Tree model in PySpark 2019-03-11T13:59:41.103

2 Is AUC a good metric for evaluating the performance of a multi-class classification? 2019-04-12T08:46:01.897

2 Can McNemar's test be applied to evaluate multiclass models? 2019-04-18T09:23:33.260

2 Conditional Entropy and Mutual Information - Clustering evaluation 2019-04-19T15:28:43.140

2 Metrics for Name Entity Recognition 2019-06-13T07:10:50.453

2 How can I get an algorithm to have an evalutation metric based on aggregate predictions? 2019-07-02T22:34:47.903

2 What is Continuous Ranked Probability Score (CRPS)? 2019-11-28T09:58:38.683

2 Evaluate clustering by using decision tree unsupervised learning 2019-11-29T12:00:56.620

2 Calculating Rank Ordering Error Metric for implicit recommendation 2019-12-12T14:46:34.620

2 XGBoost Feature Importance, Permutation Importance, and Model Evaluation Criteria 2019-12-30T04:33:22.100

2 FFR and FAR calculating for multiclasss biometric face recognition system 2020-01-09T21:02:29.393

2 How to correctly calculate average F1 score, precision and recall of a Named Entity Recognition system? 2020-02-01T10:29:07.543

2 Offline evaluation of recommender systems 2020-03-12T16:09:33.953

2 Best common metric for comparing classic time series forecasting methods (ARIMA/Prophet) with ML approach? 2020-07-15T10:55:37.227

2 Can Micro-Average Roc Auc Score be larger than Class Roc Auc Scores 2021-02-09T21:33:59.703

1 Modelling on one Population and Evaluating on another Population 2014-08-02T00:07:09.267

1 How to evaluate the clustering result when cluster numbers are not equal to data set class 2015-11-26T03:45:19.333

1 Estimating precision & recall 2016-02-02T07:36:26.677

1 How to improve an existing (trained) classifier? 2016-05-02T19:28:18.620

1 How to evaluate clusters base on an attribute of the dataset? 2016-08-20T11:08:40.737

1 Can an algorithm tested only on artificial data be accepted in a high rank conference? 2016-10-03T20:53:14.283

1 How to get the inertia at the begining when using sklearn.cluster.KMeans and MiniBatchKMeans 2016-10-18T02:48:46.287

1 How can I fix this "convex" problem ? Is it just a matter of overfitting? 2016-12-09T07:10:44.187

1 How to test People similarity measure? 2016-12-11T23:59:42.707

1 How to compare performance of Cosine Similarity and Manhatten Distance? 2017-05-24T07:09:27.113

1 Can tuning individual precision and recall classification thresholds improve deep learning models? 2017-08-28T03:04:28.523

1 What does NIST information weights refer to? 2017-09-28T03:00:21.633

1 How to evaluate multi label image retrieval model 2017-12-01T10:37:27.800

1 Performance Metric for topic extraction when there is no ground truth 2018-02-10T06:00:41.010

1 What is the advantage of using Dunn index over other metrics for evaluating clustering algorithm? 2018-02-19T03:51:57.977

1 Is recall more important than precision for mass mailings? 2018-03-27T12:47:48.753

1 Size of folds in k-fold cross-validation 2018-04-23T20:20:07.720

1 Clustering evaluation metrics with subquadratic time complexity 2018-06-15T12:08:26.610