Tag: text-mining

31 Statistical classification of text 2010-07-19T21:17:30.543

31 How to quasi match two vectors of strings (in R)? 2010-10-08T21:31:00.867

30 How well does R scale to text classification tasks? 2011-08-13T16:52:34.583

28 Two R packages for topic modeling, LDA and topicmodels? 2012-03-10T15:47:09.277

27 Machine learning techniques for parsing strings? 2012-08-28T14:48:28.033

24 Topic models and word co-occurrence methods 2012-07-15T02:37:55.383

21 Is cross validation a proper substitute for validation set? 2011-11-23T23:33:35.550

20 Has the reported state-of-the-art performance of using paragraph vectors for sentiment analysis been replicated? 2014-11-11T15:34:12.070

20 Bag-of-Words for Text Classification: Why not just use word frequencies instead of TFIDF? 2015-05-19T18:30:00.167

19 In Naive Bayes, why bother with Laplacian smoothing when we have unknown words in the test set? 2014-07-22T04:29:18.017

18 Why does Natural Language Processing not fall under Machine Learning domain? 2012-02-11T15:00:43.253

18 Difference between naive Bayes & multinomial naive Bayes 2012-07-27T14:17:18.010

17 Large scale text classification 2011-08-26T16:08:13.640

17 Semi-supervised learning, active learning and deep learning for classification 2011-10-06T21:04:45.743

17 How to calculate perplexity of a holdout with Latent Dirichlet Allocation? 2011-11-10T03:08:12.977

16 The input parameters for using latent Dirichlet allocation 2012-03-23T02:33:39.840

16 I want to build a crime index and political instability index based in news stories 2012-11-24T03:59:01.870

15 Topic prediction using latent Dirichlet allocation 2011-04-07T14:42:24.517

15 Why does ridge regression classifier work quite well for text classification? 2011-10-29T18:14:54.547

14 Examples of text mining with R (tm package) 2011-06-29T10:17:45.487

14 Text Mining: how to cluster texts (e.g. news articles) with artificial intelligence? 2015-06-07T15:14:54.767

13 What are the differences among latent semantic analysis (LSA), latent semantic indexing (LSI), and singular value decomposition (SVD)? 2010-11-19T19:01:58.203

13 How to do one-class text classification? 2012-09-08T14:07:00.157

12 At what n do n-grams become counterproductive? 2012-02-14T00:54:09.033

12 Automatic keyword extraction: using cosine similarities as features 2015-05-02T06:48:00.133

11 What are the text-mining packages for R and are there other open source text-mining programs? 2011-02-14T19:29:14.993

11 Topic models for short documents 2012-03-30T17:28:26.780

11 Why is n-gram used in text language identification instead of words? 2015-04-05T10:07:34.793

11 How does Keras 'Embedding' layer work? 2017-03-29T12:47:24.907

10 Regarding using bigram (N-gram) model to build feature vector for text document 2012-04-02T14:02:09.500

10 What is a good method for short text clustering? 2017-02-15T13:20:53.070

9 Incremental IDF (Inverse Document Frequency) 2011-11-23T10:48:03.017

9 Good books on text mining? 2012-09-02T18:51:56.840

8 Automating statistical correlation between "texts" and "data" 2010-07-26T20:03:02.643

8 Create a phrase net with R 2012-01-24T07:52:04.007

7 Generate random strings based on regular expressions in R 2011-03-04T22:29:50.427

7 How to perform text mining, sentiment mining, and business category identification, and where to obtain a categorization library 2011-04-13T10:01:19.413

7 What does inverse-chi-square in Fisher method (classifying) exactly do? 2011-05-04T19:24:03.320

7 Combining n-grams 2011-11-24T18:11:56.317

7 Viable distance metric for text articles 2012-02-19T01:18:44.960

7 Classification of conversations based on content 2012-02-17T13:47:00.473

7 Understanding and applying sentiment analysis 2012-03-22T15:13:23.320

7 Is cosine similarity a classification or a clustering technique? 2012-05-13T14:26:21.070

7 What's a good approach to estimate the probability of word frequencies? 2012-05-17T12:54:23.077

7 Bag of words vs vector space model? 2012-06-25T10:11:05.357

7 Support vector machine for text classification 2012-07-24T02:35:05.157

7 Using text mining/natural language processing tools for econometrics 2013-06-12T11:51:26.757

7 How to think of features in NLP problems 2013-08-06T18:43:23.070

7 Using topic words generated by LDA to represent a document 2014-09-06T15:36:18.623

7 Popular named entity resolution software 2015-02-08T05:17:16.137

7 Understanding the use of logarithms in the TF-IDF logarithm 2015-07-15T16:57:17.840

6 How to start an analysis of keywords from a bibliography and detect correlations? 2010-08-17T13:10:09.610

6 Pointwise mutual information for text using R 2011-05-09T02:40:41.403

6 Document classification with naive Bayes algorithm 2012-01-11T08:18:04.637

6 Kernel matrix normalisation 2012-02-16T09:02:46.647

6 Algorithms for clustering documents by similar words and phrases 2012-03-16T22:43:20.997

6 Which weighting factor to use for text categorization 2012-06-06T20:51:13.493

6 What are Effective Regression Techniques for Linguistic Analysis of Linked Data? 2012-08-12T12:39:33.207

6 Randomly distributed residuals or not? 2013-05-08T16:43:38.553

6 Machine learning techniques for spam detection, and in general for text classification 2014-03-24T16:17:24.810

6 Why can we use entropy to measure the quality of a language model? 2014-04-09T03:11:14.933

6 How would you categorize / extract information out of job descriptions? 2014-12-02T06:52:49.557

6 why add one in inverse document frequency 2015-08-12T11:39:38.687

6 Was it as valid to perform k-means on a distance matrix as on data matrix (text mining data)? 2016-06-29T01:14:56.960

5 Semantic distance between excerpts of text 2011-02-12T03:28:54.630

5 Automatic text quality grading 2011-09-12T01:00:17.273

5 How to plot results from text mining (e.g. classification or clustering)? 2011-10-24T08:59:11.803

5 Data structures and libraries for high dimensional text analysis with R 2011-11-19T01:04:53.007

5 Sophisticated models for classifying short pieces of texts 2011-12-01T19:24:33.623

5 How to reduce dimension for text document dataset? 2012-03-12T05:03:44.823

5 Suggestions on how to merge multiple datasets with an imperfect ID across databases? 2012-07-23T14:00:51.790

5 Hierarchical decomposition of an imbalanced multiclass classification problem 2012-10-06T20:07:35.323

5 Combining evidence using Dempster-Shafer theory 2013-06-17T13:25:01.117

5 Calculating pointwise mutual information between two strings 2013-12-28T11:32:45.083

5 Alternatives to bag-of-words based classifiers for text classification? 2014-05-18T06:53:03.963

5 Extracting city name from free text? 2014-08-29T18:44:13.240

5 What does "Virgin Data" mean? 2014-12-25T13:09:11.107

5 Does one need to adjust for document length (in terms of pages) in topic modeling? 2015-06-08T15:45:59.980

5 Is this interpretation of sparsity accurate? 2015-07-08T19:19:16.100

5 What is VectorSource and VCorpus in 'tm' (Text Mining) package in R 2015-08-02T14:41:54.137

5 Why are most of my points classified as noise using DBSCAN? 2017-03-29T18:47:14.080

4 Comparing cosine similarities for tf-idf vectors for documents with different length 2011-03-29T09:06:24.640

4 Estimating sample size required for optimal performance of latent semantic indexing? 2011-04-05T17:36:40.303

4 Determining trends in text 2011-06-15T18:05:17.453

4 Text mining software (beyond R) 2011-08-08T02:31:43.660

4 Clustering of 10's of millions of high dimensional data 2011-09-14T22:23:39.840

4 Machine learning in web application? 2011-09-27T12:04:42.843

4 Category selection for text classification 2011-10-17T13:54:31.043

4 Finding similarity between a reference and few working documents 2012-01-27T02:53:25.647

4 How to classify country names given possible alternate spellings or abbreviations? 2012-03-28T16:43:31.363

4 Feature selection for the text mining? 2012-03-30T02:33:38.433

4 How to optimize hyper-parameters in LDA? 2012-11-05T23:33:49.157

4 Chi-squared test for detecting trending terms 2012-12-02T10:27:28.660

4 Predicting continuous variables from text features 2013-01-24T17:56:51.483

4 Finding related words 2014-08-11T17:24:30.187

4 can we generate a random words from English letters that follow the bigram of the English language 2014-09-18T08:13:16.833

4 Python vs R for Text Mining Preprocessing 2015-01-18T21:58:33.150

4 Word2Vec : Interpretation of Subtraction or addition of vectors 2015-04-09T01:29:26.310

4 How to prove that text is linearly separable? 2015-06-05T05:49:27.613