Tag: data-wrangling

45 How much of data wrangling is a data scientist's job? 2019-04-03T15:16:24.773

10 Export pandas to dictionary by combining multiple row values 2018-05-29T15:48:56.150

4 Tools to perform SQL analytics on 350TB of csv data 2016-01-07T02:33:51.253

4 How to deal with count data in random forest 2019-02-12T22:59:07.520

4 Data wrangling for a big set of docx files advice! 2019-06-29T11:16:07.883

3 Populate column based on previous row with a twist 2018-02-14T23:13:00.410

3 Mean across every several rows in pandas 2019-01-10T12:42:02.777

3 Inputting (a lot of )data into a dataframe one row at a time 2019-02-21T06:08:40.750

2 Sort by average votes/ratings 2016-02-17T20:43:08.740

2 Technical name for this data wrangling process? Multiple columns into multi-factor single column 2016-06-07T01:42:41.457

2 How do you define the steps to explore the data? 2016-07-03T07:21:29.737

2 What is the difference between 'if the data is of good quality' and 'if the data is tidy'? 2018-08-17T20:33:29.587

2 R Combine Multiple Rows of DataFrame by creating new columns and union values 2019-03-12T17:36:53.603

2 What should I do with the NaN values on this stock quote data? 2019-05-05T18:29:18.263

2 How to use zero-inflated negative binomial regression for binary classification task? 2019-07-26T07:18:47.993

2 Tools for reading data from large, irregular csv files (aka excel file hell) 2019-09-17T22:55:11.863

2 How to preprocess an ordered categorical variable to feed a machine learning algorithm? 2020-08-20T18:58:25.020

1 Detecting boilerplate in text samples 2015-11-30T17:45:34.930

1 When to choose character instead of factor in R? 2016-06-01T16:11:21.190

1 Sum up counts in a data.frame grouped by multiple variables 2016-06-01T21:01:54.657

1 Which one is better performer on wrangling big data, R or Python? 2016-07-26T21:09:57.130

1 How do I split number string with digit pattern? 2018-03-12T09:26:14.260

1 removing special character from CSV file 2018-06-06T21:48:33.450

1 How to work with string data with a lot of NAs in an aggregation task with R 2018-08-13T10:39:29.360

1 how to calculate number of datapoints within a given time interval? 2018-10-30T01:16:48.443

1 What is the correct procedure when "joining" data takes ~6 hours? 2019-05-17T15:43:54.970

1 Cleaning Excel Data in Merged Cells 2019-05-28T08:26:42.240

1 How to run KNN (or other) on nested features? Image metadata 2019-06-04T14:33:24.807

1 How to deal with highly skewed (on counts) dependent variables? 2019-07-23T06:08:56.013

1 How to make tool that's robust to user-generated typos? 2019-12-02T23:13:48.783

1 Grouping by 2 Columns in R to Sum, Count, Percentage, Weighted mean, mode 2020-01-24T21:46:54.793

1 Group_by 2 variables and pivot_wider distribution based on 2 others 2020-02-05T23:24:18.167

1 Issue with miscount on test train split in Python 2020-11-30T22:08:37.157

0 Analytics term for turning row values into column names and count its assigned values 2016-06-29T07:30:18.007

0 Adoptive(Machine wise) ML packages 2016-07-31T13:08:30.500

0 R Programming rearranging rows and colums from timeline data 2017-11-21T05:17:15.060

0 Unable to print in Jupyter Notebook using Pandas 2017-12-27T16:00:00.310

0 Can we use median to replace all the missing values from a column 2019-07-25T14:12:03.897

0 How to obtain unique count of categorical variable based on another categorical variable? 2020-02-27T21:08:14.893

0 How do I extract album and song titles from this plain text file? 2020-04-19T23:37:49.070

0 R - using heatmaply for a 2d histogram / density 2020-09-14T00:47:18.393

0 Encoding of high cardinality multi-label categorical feature? 2020-10-20T07:50:41.257

-1 Similar values cleaning 2020-12-03T05:01:49.220