4

I am not sure if "minimize correlation" is the right title for this issue but I could not find a better sentence to describe what I would like to achieve.

Let's say that I have a black box with multiple inputs and a single output. I know one of the inputs and the output and I have multiple example recordings of both. This known input modifies the output in a way that it is not desired, therefore, I would like to get rid of this "noise" caused by the known input. The transfer function for this input can be safely assumed as linear.

What I am doing right now, it is to loop through the example recordings, creating a linear regression model to predict the unwanted outcome and subtracting it from the real measured output signal, for each example. Afterwards, I compute the average of all the fixed output signals to reveal meaningful data beyond noise.

This strategy seems to work according to the following plot:

X axis is the known input signal, Y axis is the output signal, blue and green dots represent the averaged data before and after applying the linear regression algorithm, respectively. Lines are the best fit for each data set.

You can see that the green line ("cleaned" dataset) has the smallest slope, meaning that the output variable is considerably less linearly correlated with the input than it was previously. Therefore, I assume that the regression technique explained before is working as expected.

My question, looking at the plot, is there any mathematical procedure to directly "project" the original dataset in a way that the correlation between the input and output variables is minimized? Is there any math trick to avoid the use of the regression technique on all the example datasets to obtain a similar result?

My written expression is not the best so please feel free to comment the question if you need further explanations.

Any code is welcomed but python (pandas, numpy, etc.) and Matlab are preferred. Theoretical explanations are also very welcomed.

1I think there are no alternatives to your method! – stochazesthai – 2015-05-26T13:58:05.330

Why don't you just see which input has the least pearson correlation(since the relationship is linear) to the output, and then ignore it ? – Hristo Buyukliev – 2015-05-29T08:49:19.273