I am having a problem during feature engineering. Looking for some suggestions. Problem statement: I have usage data of multiple customers for 3 days. Some have just 1 day usage some 2 and some 3. Data is related to number of emails sent / contacts added on each day etc.
I am converting this time series data to column-wise ie., number of emails sent by a customer on day1 as one feature, number of emails sent by a customer on day2 as one feature and so on. But problem is that, the usage can be of either increasing order or decreasing order for different customers.
ie., example 1: customer 'A' --> 'number of emails sent on 1st . day' = 100 . ' number of emails sent on 2nd day'=0
example 2: customer 'B' --> 'number of emails sent on 1st . day' = 0 . ' number of emails sent on 2nd day'=100
example 3: customer 'C' --> 'number of emails sent on 1st . day' = 0 . ' number of emails sent on 2nd day'=0
example 4: customer 'D' --> 'number of emails sent on 1st . day' = 100 . ' number of emails sent on 2nd day'=100
In the first two cases => My new feature will have "-100" and "100" as values. Which I guess is good for differentiating. But the problem arises for 3rd and 4th columns when the new feature value will be "0" in both scenarios Can anyone suggest a way to handle this.
One way to handle this:
I can add "No change" in those scenarios, but I am confused about one thing. If I do that, I will have to make the new feature as categorical, which is not ideal as the other values will be continuous.
Instead, I can have absolute values in the new feature and indicate the trend as "+1" or increasing "-1" for decreasing "no change" for no change and "0" if both the values have been "0". Would that be a good approach though?
The end goal is to predict if a user would continue using the application or not. So it basically would be a two-class model. And I would want to capture even the scale of usage i.e., "A user sending 100 emails every day" should be different from "B user sending 10000 emails every day"