How to remove outliers using box-plot?

8

3

I have data of a metric grouped date wise. I have plotted the data, now, how do I remove the values outside the range of the boxplot (outliers)?

All the ['AVG'] data is in a single column, I need it for time series modelling.

enter image description here

Uday T

Posted 2019-07-01T04:15:25.180

Reputation: 152

Answers

11

Seaborn uses inter-quartile range to detect the outliers. What you need to do is to reproduce the same function in the column you want to drop the outliers. It's quite easy to do in Pandas.

If we assume that your dataframe is called df and the column you want to filter based AVG, then

Q1 = df['AVG'].quantile(0.25)
Q3 = df['AVG'].quantile(0.75)
IQR = Q3 - Q1    #IQR is interquartile range. 

filter = (df['AVG'] >= Q1 - 1.5 * IQR) & (df['AVG'] <= Q3 + 1.5 *IQR)
df.loc[filter]  

Tasos

Posted 2019-07-01T04:15:25.180

Reputation: 3 340

KeyError: 'AVG' – Leos313 – 2020-10-15T09:13:03.200

1Probably you don’t have that column. The OP had a column called AVG – Tasos – 2020-10-15T15:07:55.890

right, I do not! Now, I know what to look for! Thank you – Leos313 – 2020-10-15T15:16:09.350

3

If you need to remove outliers and you need it to work with grouped data, without extra complications, just add showfliers argument as False in the function call. It's inherited from matplotlib.

showfliers=False

aerijman

Posted 2019-07-01T04:15:25.180

Reputation: 183

0

You can simply turn showfliers = False in seaborn.

Saad Ahmed

Posted 2019-07-01T04:15:25.180

Reputation: 11