5

2

TL;DR: What is the impact of a linear trend on the correlation between time series that are (most likely) not spuriously correlated?

I'm currently trying to reconstruct/cross-validate an analysis delivered by one of my companies contractors.

The data is based on time series of sensor data (approx. 3.5m timestamps). Goal was to find the signals with the highest correlation with one specific signal.

Despite not being an expert in data science I was able to reproduce their data cleaning (drop columns with zero variance, interpolate linearly over smaller gaps, drop remaining columns containing NaN-values). But after that I'm not sure if I can confirm their findings.

Seemingly they did a simple pearson-correlation like

```
corr = df.corrwith(df['DesiredSignal'])
```

Yet looking at the data the signals seem definitely trended.

When I then apply a detrend-function like

```
from scipy import signal
df_d = signal.detrend(df[column])
df_n = pd.DataFrame(data=df_d)
```

and apply the corrwith-function to this new dataframe I get totally different results (e.g. a significant higher amount of highly negativ correlations).

My Question now is: Can I trust the findings of the contractor or are they rendered invaild by not considering the influence of trends on correlation or am I getting something completly wrong?

Thank you for this really detailed answer! Although it's been alsmost a year since I asked this question you still helped me very much with future situations like this. – Viktor Katzy – 2020-06-02T09:19:42.497

I'm glad this could help! All the best for the future. – mwtmurphy – 2020-06-02T14:44:10.867