Tag: scikit-learn

200 What's the difference between fit and fit_transform in scikit-learn models? 2016-06-21T10:05:08.587

145 When to use One Hot Encoding vs LabelEncoder vs DictVectorizor? 2015-12-19T19:30:35.527

111 Train/Test/Validation Set Splitting in Sklearn 2016-11-15T14:55:04.130

101 SVM using scikit learn runs endlessly and never completes execution 2014-08-18T10:46:57.360

76 strings as features in decision tree/random forest 2015-02-25T01:07:14.717

50 Does scikit-learn have forward selection/stepwise regression algorithm? 2014-08-07T15:33:43.793

40 How to force weights to be non-negative in Linear regression 2017-04-11T03:02:54.080

38 train_test_split() error: Found input variables with inconsistent numbers of samples 2017-07-06T05:17:55.947

38 Understanding predict_proba from MultiOutputClassifier 2017-09-01T10:57:57.723

33 Calculating KL Divergence in Python 2015-12-08T10:37:44.050

32 StandardScaler before and after splitting data 2018-09-18T02:35:36.337

30 Why is xgboost so much faster than sklearn GradientBoostingClassifier? 2016-03-29T14:14:46.867

28 Difference between OrdinalEncoder and LabelEncoder 2018-10-07T18:55:40.833

27 Scikit-learn: Getting SGDClassifier to predict as well as a Logistic Regression 2015-08-04T08:11:30.990

25 How to get p-value and confident interval in LogisticRegression with sklearn? 2016-11-28T17:10:45.847

25 How to deal with string labels in multi-class classification with keras? 2017-03-11T13:42:10.793

23 Can you explain the difference between SVC and LinearSVC in scikit-learn? 2015-09-02T14:49:33.520

23 Improve the speed of t-sne implementation in python for huge data 2016-02-06T14:19:10.243

23 Sentence similarity prediction 2017-10-22T07:36:15.920

23 What is the reason behind taking log transformation of few continuous variables? 2018-10-23T13:08:02.707

22 RandomForestClassifier OOB scoring method 2016-08-02T15:47:47.503

21 How can I check the correlation between features and target variable? 2018-10-03T18:43:27.863

18 Merging sparse and dense data in machine learning to improve the performance 2016-04-06T05:14:11.457

18 Pandas Dataframe to DMatrix 2016-07-15T13:48:09.557

18 How to calculate the fold number (k-fold) in cross validation? 2018-02-22T05:23:43.347

17 Algorithms for text clustering 2014-08-15T13:10:20.937

17 What is the difference between cross_validate and cross_val_score? 2018-03-01T06:13:32.277

16 Where in the workflow should we deal with missing data? 2014-05-27T21:07:48.973

16 How does SelectKBest work? 2016-03-18T10:34:45.107

16 What is the difference between a hashing vectorizer and a tfidf vectorizer 2017-08-14T16:42:07.040

15 Predict the best time of call 2016-09-21T08:08:19.270

15 What is the difference between CountVectorizer token counts and TfidfTransformer with use_idf set to False? 2017-12-11T22:51:57.513

15 When to use Standard Scaler and when Normalizer? 2019-02-20T16:38:05.920

14 How to train model to predict events 30 minutes prior, from multi-dimensionnal timeseries 2017-04-20T13:24:46.320

14 scikit-learn n_jobs parameter on CPU usage & memory 2018-07-13T10:06:59.987

13 Feature selection using feature importances in random forests with scikit-learn 2015-08-04T17:44:35.277

13 How to adjust the hyperparameters of MLP classifier to get more perfect performance 2018-07-26T12:24:09.630

12 Does scikit-learn use regularization by default? 2016-03-21T06:51:17.803

12 Mass convert categorical columns in Pandas (not one-hot encoding) 2016-09-18T16:45:15.647

12 Interpreting Decision Tree in context of feature importances 2017-02-02T00:29:32.877

12 Feature importance with high-cardinality categorical features for regression (numerical depdendent variable) 2017-04-05T18:23:12.657

12 Using TF-IDF with other features in SKLearn 2017-09-04T11:30:19.893

12 Efficient dimensionality reduction for large dataset 2018-08-29T11:35:46.950

11 How much time do scikit classifiers take to classify? 2014-10-01T13:26:52.037

11 Clustering for mixed numeric and nominal discrete data 2015-11-02T04:12:53.367

11 How to do stepwise regression using sklearn? 2017-11-06T12:58:58.223

11 How can I fit categorical data types for random forest classification? 2018-01-04T13:03:28.490

11 What's the difference between Sklearn F1 score 'micro' and 'weighted' for a multi class classification problem? 2018-11-08T06:25:48.820

11 How to use Scikit-Learn Label Propagation on graph structured data? 2019-02-12T17:15:05.063

11 How to use SimpleImputer Class to replace missing values with mean values using Python? 2019-05-13T14:01:52.347

11 How to encode a class with 24,000 categories? 2019-09-03T00:52:15.997

10 Why does Gradient Boosting regression predict negative values when there are no negative y-values in my training set? 2014-06-24T19:43:24.643

10 Building a machine learning model to predict crop yields based on environmental data 2016-01-04T00:17:58.200

10 Parameters in GridSearchCV in scikit-learn 2016-08-13T17:58:19.430

10 Multiple Categorical values for a single feature how to convert them to binary using python 2016-10-31T12:14:04.133

10 Is max_depth in scikit the equivalent of pruning in decision trees? 2018-09-23T06:50:27.273

10 What is the most efficient method for hyperparameter optimization in scikit-learn? 2019-03-13T19:42:46.857

9 Is there a method that is opposite of dimensionality reduction? 2015-06-25T21:24:37.237

9 Feature selection for Support Vector Machines 2015-07-26T12:17:09.947

9 sklearn - overfitting problem 2015-08-11T22:21:42.453

9 How to use Cohen's Kappa as the evaluation metric in GridSearchCV in Scikit Learn? 2015-09-11T03:00:48.897

9 Export weights (formula) from Random Forest Regressor in Scikit-Learn 2016-01-08T11:57:50.097

9 Nested cross-validation and selecting the best regression model - is this the right SKLearn process? 2016-08-04T01:28:45.307

9 Can training label confidence be used to improve prediction accuracy? 2017-05-24T16:13:03.890

9 Imbalanced data causing mis-classification on multiclass dataset 2018-02-16T11:09:56.917

9 How do we standardize arrays with NaN? 2018-04-16T13:03:56.417

9 Can we remove features that have zero-correlation with the target/label? 2018-11-02T08:48:26.757

9 DeprecationWarning: The 'categorical_features' keyword is deprecated in version 0.20 2018-11-12T16:48:25.597

9 How max_features parameter works in DecisionTreeClassifier? 2018-11-19T13:39:07.983

9 Cross validation for highly imbalanced data with undersampling 2019-02-04T16:32:21.823

8 How to use TFIDF vectors with multinomial naive bayes? 2017-04-05T17:10:51.403

8 cosine_similarity returns matrix instead of single value 2018-01-15T13:22:44.650

8 Applying dimensionality reduction on OneHotEncoded array 2018-02-19T12:51:27.787

8 How to plot cost versus number of iterations in scikit learn? 2018-02-28T16:00:18.873

8 I got 100% accuracy on my test set,is there something wrong? 2018-07-19T08:16:21.663

8 How to handle preprocessing (StandardScaler, LabelEncoder) when using data generator to train? 2019-01-19T09:10:36.127

8 Is over fitting okay if test accuracy is high enough? 2020-05-23T04:54:25.113

7 Struggling to integrate sklearn and pandas in simple Kaggle task 2014-07-05T15:01:43.940

7 Is there a way of performing stratified cross validation using xgboost module in python? 2015-08-20T09:53:43.280

7 Image clustering by similarity measurement (CW-SSIM) 2016-01-10T19:44:59.887

7 what is the difference between "fully developed decision trees" and "shallow decision trees"? 2016-01-11T07:07:23.557

7 how to make sklearn pipeline using custom model? 2016-03-02T06:14:13.313

7 Interpreting the results of randomized PCA in scikit-learn 2016-03-05T19:07:07.393

7 KL-divergence returns infinity 2016-04-20T12:54:12.763

7 TF-IDF vectorizer doesn't work better than countvectorizer 2016-07-05T13:41:01.623

7 Naive Bayes Should generate prediction given missing features (scikit learn) 2016-08-22T14:03:25.350

7 Reproducing randomForest Proximity Matrix from R package in Python 2017-03-16T09:31:07.317

7 Extracting individual emails from an email thread 2017-06-01T13:02:23.683

7 Custom metrics for unbalanced classes problem in RandomForest or SVM 2017-08-04T15:35:15.427

7 Combining Machine Learning classifier with NLTK Vader for Sentiment Analysis 2017-08-15T12:37:23.997

7 Irregular Precision-Recall Curve 2017-11-21T18:44:09.630

7 sklearn: SGDClassifier yields lower accuracy than LogisticRegression 2017-11-30T06:05:09.607

7 Feature agglomeration: Is it testing interactions? 2017-12-22T11:15:52.843

7 How to prevent/tell if Decision Tree is overfitting? 2018-01-18T10:02:49.040

7 How to plot learning curve and validation curve while using pipeline 2018-03-25T18:31:12.973

7 migrating to python from R: specific questions 2018-04-04T15:00:41.253

7 How to estimate the variance of regressors in scikit-learn? 2018-05-17T12:05:38.457

7 What is difference between Multi-class One vs All and Multilabel Classification? 2018-08-24T10:13:14.273

7 How to train ML model with multiple variables? 2018-10-01T21:48:23.567