1

I am trying to classify cars for a towing company. Junky cars earn more when sent to the junkyard, and the more valuable cars should earn more at the auction, despite the auction fee. Creating a logistic regression that takes into account Make, Model, Mileage, Year and Run status helps us improve the accuracy of which cars should go where, but a difficulty arises: Sometimes, a car that would be classified as junk can actually be an outlier, and sell for a lot of money. So to optimize our model, **we don't really care that much whether we are right or wrong on an individual car, so much as we maximize our bottom line**.

All of the models I have seen (Logistic regression, RF, linear regression) make predictions on a line by line basis. What would be a good model to try and maximize the aggregate sum of the predictions?

Below is a reprex of my data, as well as basic code I used.
What I actually tried until now is to look at past data, and classify, in hindsight, what should have been done, based on prices that were earned in the auction vs available junk prices. I then ran a glm against that classification to predict the future. As mentioned above, my code improved the accuracy of our decisions, and would have sent more cars to junk correctly, but some that we classified as junk sold for so much in the auction that it wasn't worth sending *any* to junk.

What is the proper way to approach this?

```
cars <- structure(list(YearOfCar = c(2009L, 2009L, 2003L, 2004L),
Make = c("Hyundai", "Lexus", "Ford", "Toyota"), Model =
c("Sonata", "GS 350", "F-250 Super Duty","Camry"), PickUpState =
c("MN", "LA", "MA", "NJ"), Auction_Result = c(650,625,425, 1500),
Auction_Fee = c(144.25, 373.54, 213.5, 187), Mileage = c(116120L,
198900L, 140241L, 312927L), Runs = structure(c(1L, 1L, 1L, 2L),
.Label = c("No", "Yes"), class = "factor"), junkyard_Offer =
c(230L, 235L, 140L, 300L), Date = structure(c(17592, 17707,
17674, 17583), class = "Date")), row.names = 3:6, class =
"data.frame")
cars$hindsight <- ifelse(cars$Auction_Result-
cars$Auction_Fee>cars$junkyard_Offer,1,0)
glmodel <- glm(hindsight~Make+Model+Mileage+Runs, data = cars,
family="binomial")
prediction <- predict(glmodel, cars, type = 'response')
prediction_classifier <- if_else(prediction>.501,1,0)
cars$prediction_results <- ifelse(prediction_classifier==1,
cars$Auction_Result-cars$Auction_Fee,cars$junkyard_Offer)
```

As far as anomaly detection, I did actually try to use a logistic regression to classify the (otherwise) junk cars that actually did

`particularly`

well at the auction... I also tried only including cars where the logistic prediction output was above a more stringent hurdle rate, such as .70 instead of . 5 . For some reason. I wasn't yet able to find any meaningful relationships. Will investigate your ideas-Thanks. – Lamden – 2020-11-03T19:19:37.447