Exploratory statistics, how to idenify and remove driver (bias)


I am looking at customer data, and created frequency tables (+histograms) for customers with different professional statuses and what the best time is to reach them. Status ranges here from employed, retired, self-employed, unemployed, blank.

For each of these statuses, I expected some variation in terms of when the best time is to reach each type of customer. Intuitively and from experience e.g. employed people, on average, should be available early in the morning or early evening, while unemployed are expected to show a more even distribution. However, the distributions look very similar and the peak hours for all statuses are between 8-11, I am pretty sure that the agents drive this peak. There are more agents working early than in the afternoon.

How can I extract this effect to be able to focus on what the best times to reach are for these different group of customers?


Posted 2020-05-18T13:33:22.893

Reputation: 43

Just to clarify, do you have a data-driven reason to believe these specific biases are occurring in this specific manner? Do you have any crosstabs that allow you to look at, for example, different, clearly-defined groups within the employed status? What sort of customer data is this, what type of business(es) are they customers of, and how many observations do you have (overall and per category?) – Upper_Case – 2020-05-18T15:51:02.897

It is agent call data on potential customers for blood pressure devices (N=650K). Not sure if this is considered data driven, but one driver I see as a problem is the volume of agents that are scheduled at a certain time of the day: There is a pattern how the agent volumes are distributed. For example, 60% of agents are scheduled in the 1st part of the day and the other 40% are scheduled for the afternoon and evening. Because there are more agents scheduled in the morning, this will drive the volume of customers reached. My solution for this is to get an average of customer reached per agent. – ColRow – 2020-05-18T16:15:43.893

@colrow I have voted this question down bc 1) you really have not introduced the problem before you post any code. There is no example code or data. https://stackoverflow.com/help/how-to-ask 2) Asking a well thought out question is tough but it helps all involved. FYI http://tinyurl.com/stack-checklist

– oaxacamatt – 2020-05-18T23:22:28.743

No answers