How to train a simple Machine learning model in batches?


I have a dataset for multiclass classification of text data. The number of samples in training data are 1,20,000. If I extract features using TF-IDF vectoriser of the sklearn library it gives about 80,000 words as features. So, when I train a sklearn model for classification using Multinomial Naive Bayes on the Kaggle platform it exceeds the RAM limit. I now want to train the model using small sample batches of 20,000 samples and combine them. How can I do this? I want that instead of training using the whole dataset I use only a small chunk of the dataset. Also, I want to use ML model and not Deep learning or Neural Network model. I want to use Multinomial Naive Bayes as classification algorithm.

Note- I have preprocessed the data to remove stopwords, punctuations and lemmatized it before using feature extraction method.


Posted 2020-05-03T18:55:21.600

Reputation: 101

No answers