37 What are some standard ways of computing the distance between documents? 2014-07-05T16:10:21.580

29 What algorithms should I use to perform job classification based on resume data? 2014-07-03T16:11:22.637

29 General approach to extract key text from sentence (nlp) 2015-03-13T16:41:29.280

27 What is difference between text classification and topic models? 2014-08-12T03:50:52.303

24 What is Hellinger Distance and when to use it? 2017-08-31T02:11:38.127

22 Keyword/phrase extraction from Text using Deep Learning libraries 2016-02-03T10:56:51.447

21 How to annotate text documents with meta-data? 2014-05-29T20:11:16.327

21 How to grow a list of related words based on initial keywords? 2014-06-17T06:05:39.653

21 Doc2Vec - How to label the paragraphs (gensim) 2016-02-12T02:22:01.940

17 Algorithms for text clustering 2014-08-15T13:10:20.937

17 Extract most informative parts of text from documents 2014-12-08T14:51:27.613

16 What is the difference between a hashing vectorizer and a tfidf vectorizer 2017-08-14T16:42:07.040

14 How to do postal addresses fuzzy matching? 2016-03-21T12:01:23.043

14 Recognize a grammar in a sequence of fuzzy tokens 2016-08-08T13:01:19.127

13 Unstructured text classification 2014-09-05T12:08:11.347

13 Ethically and Cost-effectively Scaling Data Scrapes 2014-12-05T15:51:54.690

13 Alternatives to TF-IDF and Cosine Similarity when comparing documents of differing formats 2017-01-02T20:41:13.493

12 Preference Matching Algorithm 2014-06-18T22:10:58.497

12 Unsupervised feature learning for NER 2014-07-28T07:19:49.877

12 Document classification using convolutional neural network 2016-04-11T09:10:18.440

11 Using Clustering in text processing 2014-11-23T14:58:34.127

11 applying word2vec on small text files 2016-01-10T10:49:20.447

11 How to determine if character sequence is English word or noise 2016-04-28T17:20:13.760

10 Log file analysis: extracting information part from value part 2014-11-20T14:26:10.463

10 Multiple labels in supervised learning algorithm 2014-12-11T19:23:55.907

10 How much training data does Word2Vec need? 2015-07-22T22:34:43.403

10 Vector space model cosine tf-idf for finding similar documents 2015-10-09T16:31:44.320

10 Text-Classification-Problem: Is Word2Vec/NN the best approach? 2015-11-04T07:34:21.203

10 What is the difference between NLP and text mining? 2016-01-20T06:33:54.923

10 How to determine the complexity of an English sentence? 2017-06-03T20:12:19.593

9 Suggest text classifier training datasets 2014-06-18T16:21:12.203

9 Clustering with cosine similarity 2017-09-05T05:02:57.140

9 Public dataset for news articles with their associated categories 2017-09-26T08:56:30.433

9 what machine/deep learning/ nlp techniques are used to classify a given words as name, mobile number, address, email, state, county, city etc 2018-03-15T13:17:31.737

8 Difference between tf-idf and tf with Random Forests 2014-09-16T08:14:06.307

8 Which classification algorithms to try for classifying text data into 300 categories 2015-05-07T08:52:40.293

8 How to learn spam email detection? 2015-06-01T12:36:23.203

8 Classifying Email in R 2016-05-26T18:06:06.083

8 How evaluate text clustering? 2016-10-09T12:17:22.307

8 How to extract paragraphs from text document? 2016-11-11T06:06:35.760

8 Text extraction from documents using NLP or Deep Learning 2018-06-19T16:09:57.667

8 Resume Parsing - extracting skills from resume using Machine Learning 2018-08-04T05:27:37.990

7 What are the main types of NLP annotators? 2014-06-25T17:37:23.380

7 R error using package tm (text-mining) 2014-07-30T18:45:13.790

7 How to plot clusters in nice a way? 2018-04-23T17:06:36.410

7 which deep learning text classifier is good for health data 2018-05-30T19:03:19.533

7 Date Extraction in Python 2019-02-20T06:19:42.230

7 How to deal with spelling errors NLP 2019-12-22T16:49:27.133

6 Clustering strings inside strings? 2014-10-23T14:51:57.160

6 Name Anonymization Software 2014-12-05T14:03:58.777

6 What is an alternative name for "Unstructured Data"? 2015-06-23T12:59:13.607

6 How to give name to topics created using LDA? 2016-01-07T04:28:45.337

6 Classifying survey response text SVM 2016-01-21T16:08:59.477

6 Improve k-means accuracy 2016-02-02T01:42:38.053

6 What methods can be used to detect anomalies in temporal texual data? 2016-11-07T07:34:29.327

6 Comparing two Corpora using Topic Model 2017-02-21T20:26:03.690

6 Clustering or classifing n-gram-based text categories 2017-05-08T14:10:17.860

6 What are useful evaluation metrics used in machine learning 2018-01-20T10:25:00.973

6 Clustering Observations by String Sequences (Python/Pandas df) 2018-02-15T06:07:56.250

6 How to compute document similarities in case of source codes? 2018-02-21T09:09:52.130

6 How to compare different similarity measurements in text clustering? 2019-07-30T08:56:00.420

5 Are there any annotators or Named Entity Recognition for license plate numbers? 2014-07-24T00:01:40.760

5 Relation mining of multivariant categorical timeseries without excluding the temporal nature 2014-11-21T15:23:37.360

5 semi-structured text parsing using machine learning 2015-01-26T17:45:02.203

5 Fraud detection use text mining 2015-04-29T07:27:33.967

5 How do you compare term counts between two different periods, with different underlying corpus sizes, without bias? 2015-05-11T15:07:47.143

5 Different methods for clustering skills in text 2015-05-24T11:54:43.347

5 Optimizing co-occurrence matrix computation 2016-01-23T13:04:17.457

5 Logic in sentence : tree representation 2017-08-27T16:38:59.617

5 What algorithm can help me discover synonyms? 2017-09-23T18:53:31.323

5 Topic modeling for short length sentences 2017-12-13T15:40:01.427

5 Online news classification 2018-04-03T19:12:19.210

5 Find matching text from a text column 2018-09-26T16:39:20.990

5 How does Doc2Vec treat numerical data which is a part of text data? 2019-02-20T11:37:58.253

5 Text extraction from large pool of documents of different formats 2019-04-30T12:40:01.383

4 OpenNLP Coreference Resolution (German) 2014-08-11T07:59:22.780

4 How to recognize a two part term when the space is removed? ("bigdata" and "big data") 2015-06-09T13:49:57.683

4 Topic models for Relevance Prediction 2015-06-11T06:20:43.703

4 metric learning and information retrieval 2015-06-11T07:53:26.120

4 What is the best way to split a sentence for a keyword extraction task? 2015-10-14T03:21:48.330

4 Appropriate algorithm for string (not document) classification? 2015-10-21T12:30:53.157

4 What regressors are recommended with text modeling? 2016-01-19T06:26:00.080

4 How would you categorize email subjects to find similar emails? 2016-03-06T14:08:50.243

4 Information extraction with reinforcement learning, feasible? 2016-03-12T20:43:03.863

4 Automatic annotation of medical text data 2016-04-10T13:00:00.167

4 Which features do I select from text? 2016-05-14T19:12:02.683

4 A few ideas to parse "events" from a text document 2016-06-03T21:57:04.387

4 Creating Domain specific Question Answering Systems 2016-06-16T05:56:16.480

4 Alternative to Flesch for a readability score algorithm 2016-08-30T13:55:24.413

4 Extract 2 pieces of information from a string - what to use? 2016-09-26T09:52:12.410

4 NLTK Sklearn Genism Text to Topic 2016-11-23T16:33:35.720

4 How word2vec understands the relationship between numbers? 2016-12-13T16:03:43.037

4 How to create vectors from text for address matching using binary classification? 2016-12-20T13:41:54.690

4 Is there a process flow to follow for text analytics? 2017-02-06T14:53:02.613

4 Group similar words under one topic and assign them a title 2017-09-25T11:13:05.307

4 What are real world applications of Doc2Vec? 2017-10-09T14:20:54.000

4 Compare two topic modelling sets 2017-10-13T00:28:14.627

4 Word embedding vectors for keyphrase extraction 2017-12-17T07:44:06.213

4 Plots with shaded standard deviation 2018-04-26T23:18:57.213