2

I have achieved 68% accuracy with my logistic regression model. I want to increase the accuracy of the model. How can I apply stepwise regression in this code and how beneficial it would be for my model? What changes shall I make in my code to get more accuracy with my data set. I have attached my dataset below. Following is my code:

```
library(dplyr)
data1 <- read.csv("~/hj.csv", header=T)
train<- data1[1:116,]
VALUE<-as.numeric(rownames(train))
testset<- data1[1:116,]
mylogit <- glm(VALUE ~ POINT1 + POINT2 + POINT3 + POINT4 , data = data1, family ="binomial")
testset$predicted.value = predict(mylogit, newdata = testset, type="response")
for (i in 1: nrow(testset)){
if(testset$predicted.value[i] <= 0.50)
testset$outcome[i] <- 0
else testset$outcome[i] <- 1
}
print(testset)
tab = table(testset$VALUE, testset$outcome) %>% as.matrix.data.frame()
accuracy = sum(diag(tab))/sum(tab)
print(accuracy)
print(tab)
table(testset$VALUE, testset$outcome)
```

Following is my dataset: Link 1: http://www.filedropper.com/hj_2

Thanks a lot. It works. What are you actually doing by multiplying the features? Is it an important step in telling us that combination of more polynomial values will increase the accuracy further? – Swordsman – 2017-02-15T09:45:18.870

@ArindamMukherjee It means all the coefficients including the interaction terms. – SmallChess – 2017-02-15T11:46:26.473

Hi. Can you check the bottom-most part of this post. I have added my doubts there. Thanks :) – Swordsman – 2017-03-01T11:36:40.623

@ArindamMukherjee My answer already includes all interactions (the * operators). The other answer is invalid because we're talking about logistic, no need to go to random-forest. Random forest will reduce your accuracy, not improve. – SmallChess – 2017-03-01T11:38:06.390

Okay. So what about a scenario in which I have a list of applications which should be getting classified as 1(important) in my testing set but is not doing so through the algorithm because the data doesnot back it up to get a 1(important) and it's probability is coming way beyond the threshold value. What should I do in such a situation to get it classified as 1(important)? – Swordsman – 2017-03-01T11:52:58.503

Hi. I have a question. I have collected an observation.. Id:306, Point 1: 0.0000000, Point 2 : 0.0000000, Point 3: 6.348305, POINT 4: 5.827379, Predicted value: 0.33544758 Id:235, Point 1: 0.0000000, Point 2 : 0.0000000, Point 3: 4.904174, Point 4: 6.783267, Predicted value: 0.68598890 Is it okay to see such a variation in predicted value when there doesn't seem to be a huge change in the feature values? – Swordsman – 2017-03-06T06:04:02.433

@Swordsman It's very hard for me to answer in a comment like this. Do you think you can start a new question? – SmallChess – 2017-03-06T06:04:53.207

@Swordsman I'll read your new question and other people will also join. – SmallChess – 2017-03-06T06:05:09.610

Okay sure. I will post it – Swordsman – 2017-03-06T06:06:36.807