Accuracy improvement for logistic regression model


I have achieved 68% accuracy with my logistic regression model. I want to increase the accuracy of the model. How can I apply stepwise regression in this code and how beneficial it would be for my model? What changes shall I make in my code to get more accuracy with my data set. I have attached my dataset below. Following is my code:

data1 <- read.csv("~/hj.csv", header=T)
train<- data1[1:116,]
testset<- data1[1:116,]
mylogit <- glm(VALUE ~ POINT1 + POINT2 + POINT3 + POINT4 , data = data1, family ="binomial")
testset$predicted.value = predict(mylogit, newdata = testset, type="response")
for (i in 1: nrow(testset)){
  if(testset$predicted.value[i] <= 0.50)
    testset$outcome[i] <- 0 
  else testset$outcome[i] <- 1
tab = table(testset$VALUE, testset$outcome) %>%
accuracy = sum(diag(tab))/sum(tab)
table(testset$VALUE, testset$outcome)

enter image description here

Following is my dataset: Link 1:


Posted 2017-02-15T07:08:31.970

Reputation: 63




mylogit <- glm(VALUE ~ POINT1 * POINT2 * POINT3 * POINT4, data = data1, family ="binomial")

with about 72% accuracy.


Posted 2017-02-15T07:08:31.970

Reputation: 3 050

Thanks a lot. It works. What are you actually doing by multiplying the features? Is it an important step in telling us that combination of more polynomial values will increase the accuracy further? – Swordsman – 2017-02-15T09:45:18.870

@ArindamMukherjee It means all the coefficients including the interaction terms. – SmallChess – 2017-02-15T11:46:26.473

Hi. Can you check the bottom-most part of this post. I have added my doubts there. Thanks :) – Swordsman – 2017-03-01T11:36:40.623

@ArindamMukherjee My answer already includes all interactions (the * operators). The other answer is invalid because we're talking about logistic, no need to go to random-forest. Random forest will reduce your accuracy, not improve. – SmallChess – 2017-03-01T11:38:06.390

Okay. So what about a scenario in which I have a list of applications which should be getting classified as 1(important) in my testing set but is not doing so through the algorithm because the data doesnot back it up to get a 1(important) and it's probability is coming way beyond the threshold value. What should I do in such a situation to get it classified as 1(important)? – Swordsman – 2017-03-01T11:52:58.503

Hi. I have a question. I have collected an observation.. Id:306, Point 1: 0.0000000, Point 2 : 0.0000000, Point 3: 6.348305, POINT 4: 5.827379, Predicted value: 0.33544758 Id:235, Point 1: 0.0000000, Point 2 : 0.0000000, Point 3: 4.904174, Point 4: 6.783267, Predicted value: 0.68598890 Is it okay to see such a variation in predicted value when there doesn't seem to be a huge change in the feature values? – Swordsman – 2017-03-06T06:04:02.433

@Swordsman It's very hard for me to answer in a comment like this. Do you think you can start a new question? – SmallChess – 2017-03-06T06:04:53.207

@Swordsman I'll read your new question and other people will also join. – SmallChess – 2017-03-06T06:05:09.610

Okay sure. I will post it – Swordsman – 2017-03-06T06:06:36.807


Try all possible combinations of interaction terms. Have you checked co-linearity of variables? Have you checked all variables interaction? Why do you stick to LR? Try Random Forest also. See what gives u best accuracy on k-fold validation.

Arpit Sisodia

Posted 2017-02-15T07:08:31.970

Reputation: 365

Based on my dataset, which algorithm should best work according to you? And for enhancing accuracy further, can you suggest anything? – Swordsman – 2017-02-28T16:32:45.680

compare accuracy with Random Forest. – Arpit Sisodia – 2017-03-01T09:00:07.347

Okay I ll check that. 1 more thing I want to add here about the present scenario : I have 8 applications in my training set which are being marked manually as important but I'm not able to get those apps under the Important bracket while testing because they are coming up with an importance probability of less than 40% based on the data we have gathered. – Swordsman – 2017-03-01T11:21:37.517

I want the algorithm to fit all the cases of apps which are marked as important in my training set as important, to fall under the Important bracket only keeping in mind the fact that it will increase the number of apps under the Important bracket. But how will I put these 8 apps under Important list when the data is saying otherwise? – Swordsman – 2017-03-01T11:21:47.377

Can you help in answering this? – Swordsman – 2017-03-01T11:23:14.530