20

5

I am using OpenCV letter_recog.cpp example to experiment on random trees and other classifiers. This example has implementations of six classifiers - random trees, boosting, MLP, kNN, naive Bayes and SVM. UCI letter recognition dataset with 20000 instances and 16 features is used, which I split in half for training and testing. I have experience with SVM so I quickly set its recognition error to 3.3%. After some experimentation what I got was:

UCI letter recognition:

- RTrees - 5.3%
- Boost - 13%
- MLP - 7.9%
- kNN(k=3) - 6.5%
- Bayes - 11.5%
- SVM - 3.3%

Parameters used:

RTrees - max_num_of_trees_in_the_forrest=200, max_depth=20, min_sample_count=1

Boost - boost_type=REAL, weak_count=200, weight_trim_rate=0.95, max_depth=7

MLP - method=BACKPROP, param=0.001, max_iter=300 (default values - too slow to experiment)

kNN(k=3) - k=3

Bayes - none

SVM - RBF kernel, C=10, gamma=0.01

After that I used same parameters and tested on Digits and MNIST datasets by extracting gradient features first (vector size 200 elements):

Digits:

- RTrees - 5.1%
- Boost - 23.4%
- MLP - 4.3%
- kNN(k=3) - 7.3%
- Bayes - 17.7%
- SVM - 4.2%

MNIST:

- RTrees - 1.4%
- Boost - out of memory
- MLP - 1.0%
- kNN(k=3) - 1.2%
- Bayes - 34.33%
- SVM - 0.6%

I am new to all classifiers except SVM and kNN, for these two I can say the results seem fine. What about others? I expected more from random trees, on MNIST kNN gives better accuracy, any ideas how to get it higher? Boost and Bayes give very low accuracy. In the end I'd like to use these classifiers to make a multiple classifier system. Any advice?

Yes, indeed error rates on training data set are around 0. Changing parameters to reduce overfitting didn't result in higher accuracy on test dataset in my case. I will look into techniques you mention as soon as possible and comment, thank you. – Mika – 2014-07-17T16:16:51.433

What are the relative proportions of training and test dataset btw? Something line 70:30, 60:40, or 50:50? – None – 2014-07-17T16:38:07.737

First dataset - UCI letter recognition is set to 50:50 (10000:10000), Digits is about 51:49 (1893:1796) and MNIST is about 86:14 (60000:10000). – Mika – 2014-07-18T01:35:18.207

I experimented with PCA, still didn't get good results with random forrest, but boost and Bayes now give results similar to other classifiers. I found a discussion about random forrest here: http://stats.stackexchange.com/questions/66543/random-forest-is-overfitting It is possible I am actually not overfitting but couldn't find the out-of-bag (OOB) prediction error mentioned there. Running experiment now with a large number of trees to see if accuracy will improve.

– Mika – 2014-07-21T16:04:50.647Okay, sounds you are making a little bit of progress :) A trivial question, but have you standardized your features (z-score) so that they are centered around the mean with standard deviation=1? – None – 2014-07-21T16:19:55.410

Actually no, I usually would scale features to range 0-1 but now I see I didn't even do that correctly before PCA. So that would not be the right thing to do anyway? After PCA mean = 0, std = 0.5754. – Mika – 2014-07-21T16:41:29.853

It depends on your data whether you want to do a Min-max normalization to unit range (e.g., 0-1) or Z-score normalization/standardization to unit variance (variance=1, mean=0). Sorry, but I forgot that you are doing text classification. I think normalization after you stemmed the words and used a vectorizer function would not be necessary – None – 2014-07-23T16:22:00.253

It took me a while to try everything out, I had an error earlier with PCA, now I see I just get much lower accuracy when using it. I reduce dimensions to 100, and that should be fine, but SVM gives me 4% error on MNIST (0.6% without PCA) and over 20% error on DIGITS (4% without PCA). Same for other classifiers. Earlier I somehow made the error of doing PCA on the whole dataset (train and test sets) which gave me too optimistic results. – Mika – 2014-07-31T05:29:44.877

Which programming language are you using btw? If you are a Python guy, I'd have some examples here where I used PCA, maybe it helps: http://sebastianraschka.com/Articles/2014_about_feature_scaling.html http://sebastianraschka.com/Articles/2014_scikit_dataprocessing.html http://sebastianraschka.com/Articles/2014_pca_step_by_step.html Usually I prefer LDA since I am mostly working with supervised datasets (class labels), a separate article (like the step by step PCA) is in the works ;)

– None – 2014-07-31T13:47:17.910I am using C++ for classification and Matlab to prepare datasets. I will check out your links and try LDA too. – Mika – 2014-07-31T16:04:48.890

I tried using LDA but can't get it working with my data. Matlab function classify should perform LDA but it works only up to 20 dimensions, at least on my data. Also I found that maximum dimensions given by LDA should be number_of_classes-1, which is too little. – Mika – 2014-08-03T09:20:08.827

I just uploaded the LDA article, although I used Python for the step-wise implementation, the Intro might still be interesting and helpful: http://sebastianraschka.com/Articles/2014_python_lda.html

– None – 2014-08-03T21:40:36.477Finally I found out what was going on... My function in Matlab that writes features to a file would add spaces sometimes and only on some datasets and then my reader function in C would apparently read wrong values... PCA actually helped, boost classifier is still bad but will try to play with parameters some more to make it work. Still didn't try LDA but will do that too. – Mika – 2014-08-05T08:49:26.577

Nice! I am glad to here that it was "just" a technical problem :). For supervised training samples, LDA is often (but not always) a better choice than PCA. There is a research article where the authors discuss this point: http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=908974

– None – 2014-08-05T15:04:04.100