Tag: data-mining

173 K-Means clustering for mixed numeric and categorical data 2014-05-14T05:58:21.927

69 Are Support Vector Machines still considered "state of the art" in their niche? 2014-07-09T12:22:22.400

69 Open source Anomaly Detection in Python 2015-07-22T14:26:58.660

37 What are some standard ways of computing the distance between documents? 2014-07-05T16:10:21.580

33 How to do SVD and PCA with big data? 2014-09-25T08:40:59.467

33 Why do we need XGBoost and Random Forest? 2017-10-14T12:33:00.527

31 Meaning of latent features? 2014-07-16T09:24:51.780

29 Gini coefficient vs Gini impurity - decision trees 2014-09-09T12:44:16.967

26 Why are NLP and Machine Learning communities interested in deep learning? 2014-10-11T10:24:01.393

25 Is Data Science the Same as Data Mining? 2014-05-14T01:25:59.677

24 How to deal with time series which change in seasonality or other patterns? 2014-12-22T03:30:45.673

24 Word2Vec vs. Sentence2Vec vs. Doc2Vec 2017-06-30T07:05:33.707

24 What is Hellinger Distance and when to use it? 2017-08-31T02:11:38.127

21 What statistical model should I use to analyze the likelihood that a single event influenced longitudinal data 2014-06-20T03:18:59.477

18 K-means: What are some good ways to choose an efficient set of initial centroids? 2015-04-30T13:42:05.343

17 One-Class discriminatory classification with imbalanced, heterogenous Negative background? 2014-06-11T10:11:59.397

16 Decision tree vs. KNN 2015-12-05T22:24:29.063

15 Item based and user based recommendation difference in Mahout 2014-12-04T05:18:03.720

15 Using attributes to classify/cluster user profiles 2015-05-19T23:34:25.213

15 Why are ensembles so unreasonably effective 2016-05-25T13:08:06.693

15 How much data are sufficient to train my machine learning model? 2017-06-26T21:26:04.680

14 Is there any APIs for crawling abstract of paper? 2014-05-17T08:45:08.420

14 Big data case study or use case example 2014-06-11T06:07:45.767

14 What is difference between one hot encoding and leave one out encoding? 2016-03-23T03:25:53.170

14 Recognize a grammar in a sequence of fuzzy tokens 2016-08-08T13:01:19.127

13 Is FPGrowth still considered "state of the art" in frequent pattern mining? 2014-07-12T17:25:52.907

13 Neo4j vs OrientDB vs Titan 2014-12-18T04:36:06.107

12 How does the naive Bayes classifier handle missing data in training? 2014-12-16T13:07:55.063

12 How to scrape imdb webpage? 2015-04-15T23:53:13.957

12 Airline Fares - What analysis should be used to detect competitive price-setting behavior and price correlations? 2015-05-17T20:12:48.760

11 Working with HPC clusters 2014-07-08T13:45:07.583

11 Relationship between KS, AUROC, and Gini 2014-11-23T01:05:06.473

11 LinkedIn web scraping 2015-05-13T21:01:03.070

11 How to avoid overfitting in random forest? 2015-07-07T18:05:23.903

11 How can I fit categorical data types for random forest classification? 2018-01-04T13:03:28.490

11 Pandas change value of a column based another column condition 2019-07-31T10:08:51.983

10 Clustering customer data stored in ElasticSearch 2014-05-14T08:38:07.007

10 How to debug data analysis? 2014-06-15T12:26:50.060

10 NASDAQ Trade Data 2014-07-19T20:46:52.740

10 Why might several types of models give almost identical results? 2014-08-18T14:56:13.800

10 What initial steps should I use to make sense of large data sets, and what tools should I use? 2014-08-19T17:50:52.583

10 Scalable Outlier/Anomaly Detection 2014-10-17T10:47:13.197

10 How to create a good list of stopwords 2015-05-24T21:45:02.207

10 How do I calculate the delta term of a Convolutional Layer, given the delta terms and weights of the previous Convolutional Layer? 2015-06-02T20:16:43.627

10 Which is faster: PostgreSQL vs MongoDB on large JSON datasets? 2015-06-03T20:29:40.490

10 User-product positive (click data) available. How to generate negative (no-click data)? 2015-11-17T16:10:20.000

10 Python: Handling imbalance Classes in python Machine Learning 2016-04-25T07:26:53.743

10 Visualizing items frequently purchased together 2016-10-06T21:27:28.460

10 How to impute Missing values not the usual way? 2020-01-11T07:52:56.467

9 Human activity recognition using smartphone data set problem 2014-05-27T10:41:33.220

9 Relational Data Mining without ILP 2014-06-17T13:46:06.367

9 Learning signal encoding 2014-06-18T03:19:07.557

9 How to build a textual search engine? 2014-09-12T11:48:21.617

9 Using NLP to automate the categorization of user description 2014-12-09T20:49:37.093

9 How to model user's buying behavior on Amazon? 2015-11-05T17:06:27.647

9 Clustering with cosine similarity 2017-09-05T05:02:57.140

9 Public dataset for news articles with their associated categories 2017-09-26T08:56:30.433

9 How can I perform stratified sampling for multi-label multi-class classification? 2018-06-13T11:18:12.543

8 What is the use of user data collection besides serving ads? 2014-07-31T18:52:56.307

8 Matrix properties and machine learning/data mining 2014-10-30T18:22:18.907

8 Evaluating Recommendation engines 2014-11-26T04:40:17.840

8 How to connect data-mining with machine learner process 2014-12-03T15:56:50.687

8 How does SQL Server Analysis Services compare to R? 2015-03-27T08:41:13.680

8 One Hot encoding for large number of values 2015-10-03T18:37:16.597

8 Filling missing data with other than mean values 2015-10-06T10:51:52.883

8 Method for finding top-k cosine similarity based closest item on large dataset 2016-03-25T17:34:06.167

8 How to scrape a website with a searchbar 2016-05-13T09:43:55.280

8 How to extract paragraphs from text document? 2016-11-11T06:06:35.760

8 Is it possible to train a neural network to solve polynomial equations? 2017-02-09T16:01:38.533

8 Why do we use a Gaussian kernel as a similarity metric? 2017-03-04T00:59:41.293

8 Understanding how distributed PCA works 2017-04-19T08:58:18.707

7 What would be a good way to use clustering for outlier detection? 2014-12-06T15:04:03.823

7 How to preprocess different kinds of data (continuous, discrete, categorical) before Decision Tree learning 2015-08-07T10:43:50.747

7 How will ADA Boost be used for solving regression problems? 2015-08-31T05:45:00.513

7 Training Dataset for Sentiment Analysis of Movie Reviews 2016-04-15T03:33:55.863

7 Computational aspects are typically ignored by statisticians 2018-07-19T08:46:10.267

6 Computing Image Similarity based on Color Distribution 2014-07-27T21:54:05.003

6 Looking for a strong Phd Topic in Predictive Analytics in the context of Big Data 2014-09-25T20:18:46.880

6 How to do this complicated data extrapolation, prediction modeling? 2014-10-12T05:27:17.687

6 Is it possible to identify different queries/questions in sentence? 2014-10-16T05:44:40.183

6 Machine Learning - Where is the difference between one-class, binary-class and multinominal-class classification? 2014-10-20T06:38:16.490

6 How to detect overfitting of a stock screener 2015-03-02T23:02:45.583

6 What is the best way to propose an item from a set based on previous choices? 2015-06-23T16:29:23.320

6 Working with inaccurate (incorrect) dataset 2015-06-24T16:36:32.730

6 When do I have to use aucPR instead of auROC? (and vice versa) 2015-11-24T11:50:46.290

6 Is it advisable to rerun LASSO multiple (2) times? 2015-12-16T21:20:08.390

6 How can I predict the acceptance of an article by publisher? 2016-01-04T20:22:46.043

6 How to give name to topics created using LDA? 2016-01-07T04:28:45.337

6 Check similarity between time series 2016-12-19T11:22:03.587

6 Pandas v. SFrame in learning data science 2017-03-09T12:33:59.773

6 Instead of one-hot encoding a categorical variable, could I profile the data and use the percentile value from it's cumulative density distribution? 2018-04-04T00:31:09.753

6 How to predict customer's next purchase 2018-04-16T09:41:06.617

6 How to split natural language script into segments? 2018-04-16T15:32:25.740

5 What are good sources to learn about Bootstrap? 2014-06-17T18:13:46.230

5 Efficient solution of fmincg without providing gradient? 2014-06-21T04:59:06.620

5 Stochastic gradient descent in logistic regression 2014-07-07T11:43:48.430

5 Rough vs Fuzzy vs Granular Computing 2014-10-26T13:12:23.597

5 Relation mining of multivariant categorical timeseries without excluding the temporal nature 2014-11-21T15:23:37.360

5 Mahout Similarity algorithm comparison 2014-12-02T11:12:06.103