Tag: data

33 How does the validation_split parameter of Keras' fit function work? 2018-09-30T06:30:57.810

28 Why is it wrong to train and test a model on the same dataset? 2020-12-13T14:11:58.530

26 Is pandas now faster than data.table? 2017-10-25T02:43:49.793

19 How is a splitting point chosen for continuous variables in decision trees? 2017-11-03T21:45:09.203

15 Do modern R and/or Python libraries make SQL obsolete? 2017-02-24T19:33:34.840

15 How much data are sufficient to train my machine learning model? 2017-06-26T21:26:04.680

13 How to create US state heatmap 2016-01-04T18:35:57.540

12 Interpreting Decision Tree in context of feature importances 2017-02-02T00:29:32.877

12 Interactive labeling/annotating of time series data 2018-09-11T06:19:43.463

11 How to perform Logistic Regression with a large number of features? 2017-07-28T09:32:13.880

9 What are some of the best practices for sharing data and models with colleagues? 2017-03-17T18:45:16.867

8 Do I have to standardize my new polynomial features? 2015-11-25T11:11:25.923

8 What are the most suitable machine learning algorithms according to type of data? 2017-06-23T02:09:35.357

8 How to delete entire row if values in a column are NaN 2018-04-13T01:28:07.543

8 Generate timeseries data 2019-05-26T01:58:15.280

7 How to preprocess different kinds of data (continuous, discrete, categorical) before Decision Tree learning 2015-08-07T10:43:50.747

7 IID violation in machine learning 2016-03-07T23:03:06.100

7 How to handle the CEO expectations from a company that's new to data science? 2016-07-21T06:41:18.743

7 How to generate training data for OCR 2016-11-28T15:29:51.543

7 Is it advisable to combine two dataset? 2018-09-30T16:43:48.810

7 Docker for data science 2019-08-17T12:17:08.537

7 Data anonymization in Python 2019-10-23T23:40:54.757

7 How important is advanced SQL for data science? 2020-04-30T10:31:33.930

6 Why aren't languages like C, C++ used for data analytics instead of R, Python? 2016-04-07T18:41:02.600

6 Why is a correlation matrix symmetric? 2018-05-07T17:31:01.273

6 XGBoost Huge Dataset ~1TB 2019-06-15T08:05:34.913

6 Un-learning a single training example from a trained model 2020-04-28T14:11:34.333

5 Finding aggregated information of data 2015-06-23T13:54:03.990

5 Tool to Generate 2D Data via Mouse Clicking 2015-10-27T17:16:08.807

5 General way to reduce features 2016-02-24T06:07:13.507

5 Merging large CSV files in pandas 2016-07-28T15:15:45.510

5 Skills that school doesn't teach you 2016-08-17T19:08:17.143

5 Missing Values in Data 2017-08-31T10:08:51.103

5 Can I scrape data from government websites if there is no mention about commercial usage? 2017-12-12T19:40:07.250

5 Small data set in machine learning 2017-12-30T14:59:26.090

5 Decision Tree used for Calculating Precision, Accuracy, and Recall, class breakdown question 2018-01-28T05:12:10.113

5 Purpose of weights in neural networks 2018-03-07T10:11:34.480

5 How to handle columns with categorical data and many unique values 2019-04-08T11:04:22.033

5 How to train ML algorithm with multiple values in target data? 2019-04-27T15:00:15.560

5 Oversampling/Undersampling only train set only or both train and validation set 2019-10-17T08:21:06.343

5 Machine learning methods for panel (longitudinal) data 2020-01-10T23:22:07.497

5 One Hot Encoding for any kind of dataset 2020-07-10T01:58:51.420

4 What kind of research can be done with genomic data? 2015-06-22T02:13:40.850

4 python - Will this data mining approach work? Is it a good idea? 2015-07-01T15:52:15.467

4 Recommendations for storing time series data 2015-08-20T22:21:12.060

4 Deploying machine learning modules 2015-09-30T13:53:34.730

4 How to deal with analyzing optional survey data 2016-01-04T04:49:51.893

4 What are the methods to ensure that the population split for A/B test is random? 2016-02-26T13:27:48.553

4 What is the most used format to save data with type information 2016-08-25T10:57:42.487

4 How to generate bulk graphics using R 2017-02-25T13:40:25.847

4 How to use the same minmaxscaler used on the training data with new data? 2018-04-25T11:42:10.643

4 How to release datasets with fingerprinting 2018-05-04T14:54:27.380

4 What is the meaning of the term "pipeline" within data science? 2018-07-20T15:02:49.187

4 Understanding data normalisation 2018-08-21T06:48:32.513

4 How to know for sure if we can learn from a given data or not? 2018-09-03T08:27:09.253

4 Train classifier on balanced dataset and apply on imbalanced dataset? 2019-03-05T16:10:02.510

4 Aggregate NumPy array with condition as mask 2019-03-31T21:40:59.283

4 Import data from google drive to Kaggle Kernel 2019-06-01T00:26:32.460

4 Smart data split (train/eval) for Object Detection 2019-06-25T11:44:43.277

3 Spatial clustering based on response to inputs and building a reduced model 2015-07-05T01:07:42.470

3 Advise on making predictions given collection of dimensions and corresponding probabilities 2015-08-11T19:28:29.793

3 Weighted k nearest neighbor search 2015-08-13T13:52:11.463

3 Domain-specific data science programs 2015-09-25T06:40:34.843

3 Establishing data science programs as an independent discipline 2015-10-01T18:57:42.820

3 Central Probability Interval 2015-12-11T23:01:37.850

3 Mathematics major for data science 2016-01-07T19:11:01.153

3 Algorithm or formula to measure happiness? 2016-04-27T19:21:19.720

3 About data cleansing, to what extent should we do our work? 2016-04-29T04:19:50.950

3 Which is better for Data Science, a double major in Math & CS or Physics & CS? 2017-03-01T19:31:22.463

3 How to deal with large data sets 2017-11-21T17:11:49.800

3 Columns with no (or nearly no) differences between rows worth keeping? 2017-12-17T12:56:58.843

3 What is the appropriate name for this dataset? 2018-01-30T13:30:44.470

3 Neural Network for Multiple Float Output 2018-02-13T20:12:58.267

3 Deep Learning: Does starting the training on a smaller subset of the data make sense? 2018-08-17T05:26:26.403

3 How to generate data if algo itself is involved in the process with a feedback loop? 2018-09-18T09:45:15.577

3 Create a binary-classification dataset (python: sklearn.datasets.make_classification) 2018-10-02T14:27:33.290

3 Why do a lot of people use ipython notebook over python file when doing analyzing data? Is it the same in industry? 2019-01-19T08:44:20.973

3 How to choose tools for web dashboard? 2019-03-08T21:55:55.173

3 How to correctly apply the same data transformation , used on the training dataset , on real data in a webservice? 2019-03-26T13:52:40.997

3 How to penalize for empty fields in a DataFrame? 2019-03-31T14:02:59.863

3 Rearranging data frame from column names to key value pairs 2019-04-04T13:14:27.343

3 why One-Hot Encoder can avoid the situation that the model will misunderstand the data to be in some kind of order if the data has been Label Encoding 2019-04-25T11:55:23.743

3 What happens to the left over unpicked data in Random Forest 2019-04-26T14:23:59.077

3 How to reduce position changes after dimensionality reduction? 2019-05-22T11:18:32.580

3 Purpose of converting continuous data to categorical data 2019-06-21T19:26:20.703

3 Why do seaborn.dist and pyplot.hist generate two different looking histograms on the same data? 2019-07-30T09:53:31.517

3 How do I best visualize this voltage data for a science project 2020-01-07T01:06:49.177

3 How to build a unbiased predictive ML model when the record of the event is less compared to the total number of records? 2020-01-22T17:18:18.197

3 Estimating and/or determining the amount of data sufficient enough to train a model 2020-01-24T04:42:29.160

3 Can numerical discrete finite data be always treated also as categorical? 2020-02-04T12:37:56.370

3 Combining time-series data from different devices 2020-02-18T18:06:43.613

3 How can I handle a column with list data? 2020-04-11T08:09:03.127

3 Continuous VS Categorical variable 2020-05-29T22:27:27.090

3 Is it acceptable not to transform() test data after train data is being fit_transform()-ed 2020-05-31T15:19:44.783

2 How to store complex tables and structures? 2015-07-21T15:52:30.010

2 What proxies could be used to assess economic value of Stackoverflow for its users? 2015-08-02T09:22:41.890

2 How to select a bunch of optimized data from a larger data set? 2015-11-23T16:49:32.403

2 Data structure design for supporting arbitrary number of columns in table or database 2016-02-01T17:36:16.720

2 PCA on acceleration time series data 2016-02-07T15:34:18.730