How to align similar TimeSeries like ImageAlign?

23

10

I have three audio files.

human=Audio["http://home.ustc.edu.cn/~xiaozh/SE/Audio/human.wav"];
hus=Audio["http://home.ustc.edu.cn/~xiaozh/SE/Audio/hus.wav"];
parm=Audio["http://home.ustc.edu.cn/~xiaozh/SE/Audio/parm.wav"];

The first one is generated by human.The second and third one are generated by computer using different methods.

The texts are the same:

联合国秘书长特使、前南地区维和部队最高长官明石康,5日书面回答了本报记者的提问。

You can hear these three sounds are very similar even if you don't speak Chinese.

AudioPlot@{human, hus, parm}

enter image description here

Spectrogram[#, PlotRange -> {Automatic, {0, 2000}}] & /@ {human, hus, parm}

enter image description here

If I want to compare there RMSAmplitude:

AudioLocalMeasurements[#, "RMSAmplitude"] & /@ {human, hus, parm}

enter image description here

And then plot it.

ListLinePlot[#, PlotRange -> All, PlotLegends -> {"human", "hus"}, AspectRatio ->  1/5] &[
AudioLocalMeasurements[#, "RMSAmplitude"] & /@ {human, hus}]

enter image description here

ListLinePlot[#, PlotRange -> All, PlotLegends -> {"human", "parm"}, AspectRatio -> 1/5] &[
AudioLocalMeasurements[#, "RMSAmplitude"] & /@ {human, parm}]

enter image description here

But actually,I want to align these curves,that is the second one may be shift to left a little and the third one may be shift to right a little: enter image description here enter image description here

There is a function can do align for pictures named ImageAlign even if their size,Translation are not equal.

PS:the picture of first coefficient of MFCC may be more clear

ListLinePlot[#, PlotRange -> All, PlotLegends -> {"human", "hus"},AspectRatio -> 1/5] &[
TimeSeriesMap[First, AudioLocalMeasurements[#, "MFCC"]] & /@ {human, hus}]

enter image description here

ListLinePlot[#, PlotRange -> All, PlotLegends -> {"human", "parm"},AspectRatio -> 1/5] &[
TimeSeriesMap[First, AudioLocalMeasurements[#, "MFCC"]] & /@ {human, parm}]

enter image description here

So I wonder if there is some mothods can align TimeSeries object like ImageAlign?

partida

Posted 2016-10-30T13:28:05.147

Reputation: 6 452

Should the audiofiles be just shifted, or also parts removed/contracted, etc? – Feyre – 2016-11-20T19:42:53.800

@Feyre audio shifted and multiplied by some scaling factor at some time intervals are preferred. – partida – 2016-11-21T03:17:42.280

I'm sorry,I missed something..And I have made some edit.If you mind that,you can rollback. – yode – 2017-06-21T10:48:34.717

@AlexeyPopkov If the threads mean this question?if yes,the image works fine here. – partida – 2017-06-21T12:31:26.493

Now it's OK. Apparently the problem was temporary. – Alexey Popkov – 2017-06-21T14:48:12.007

Answers

19

One way to approach this is with "Dynamic Time Warping". First, preprocess your data to get the MFCC coefficients and extract the data from the time series:

human = Audio["http://home.ustc.edu.cn/~xiaozh/SE/Audio/human.wav"];
hus = Audio["http://home.ustc.edu.cn/~xiaozh/SE/Audio/hus.wav"];
{humMFCC, husMFCC} = AudioLocalMeasurements[#, "MFCC"] & /@ {human, hus};
x = TimeSeriesMap[First, humMFCC]["Values"]; nX = Length[x];
y = TimeSeriesMap[First, husMFCC]["Values"]; nY = Length[y];

Now define the DTW method:

(*distance function*)
dist[s_, t_] := Abs[s - t];
(*boundary conditions*)
Clear[dtw];
dtw[1, 1] = dist[x[[1]], y[[1]]];
dtw[1, j_] := dtw[1, j] = dist[x[[1]], y[[j]]] + dtw[1, j - 1];
dtw[i_, 1] := dtw[i, 1] = dist[x[[i]], y[[1]]] + dtw[i - 1, 1];
(*main recursion*)   
dtw[i_, j_] := dtw[i, j] = dist[x[[i]], y[[j]]] + 
                           Min[dtw[i - 1, j - 1], dtw[i - 1, j], dtw[i, j - 1]];
(*finding best path through dtwMatrix*) 
pathFind[{i_, j_}] := Module[{nbhd}, 
   nbhd = {{i, Max[j - 1, 1]}, {Max[i - 1, 1], j}, {Max[i - 1, 1], Max[j - 1, 1]}}; 
   nbhd[[First[Ordering[Map[dtwMat[[#[[1]], #[[2]]]] &, nbhd]]]]]];

Finally, apply the DTW to your data:

distMat = Outer[dist, x, y];
dtwMat = dtwPath = Outer[dtw, Range[nX], Range[nY]];
bestPath = NestWhileList[pathFind, {nX, nY}, (#[[1]] > 1) || (#[[2]] > 1) &];
ArrayPlot[Reverse@#, Frame -> False] &@ ReplacePart[dtwPath, {{x_, y_} /; 
    MemberQ[bestPath, {x, y}] -> 0, {x_, y_} /; ! MemberQ[bestPath, {x, y}] -> 1}]

enter image description here

The picture represents one MFCC on the horizontal axis and the other on the vertical. The best path is the jagged diagonal line where the two time series are best aligned. This lines up the MFCCs of the two audio streams. The bestPath variable contains a collection of pairs {indexX, indexY} which show the optimal correspondence in the original sequences x and y, thus they can be used to index into x and y and so demonstrate the alignment. For example, here is a plot of the aligned first coefficients of the MFCCs:

indX = Reverse[Transpose[bestPath][[1]]];
indY = Reverse[Transpose[bestPath][[2]]];
ListLinePlot[{x[[indX]], y[[indY]]}, PlotStyle -> {Blue, Green}]

enter image description here

To realign the audio itself then requires taking this mapping and resampling the audio.

Update: thanks to partida for pointing out some indexing issues in the DTW.

bill s

Posted 2016-10-30T13:28:05.147

Reputation: 62 963

4

In MMA11.0,there is a new function:WarpingCorrespondence

It makes the DTW(Dynamic time warping) very easy.

human = Audio["http://home.ustc.edu.cn/~xiaozh/SE/Audio/human.wav"];
hus = Audio["http://home.ustc.edu.cn/~xiaozh/SE/Audio/hus.wav"];
{humMFCC, husMFCC} = AudioLocalMeasurements[#, "MFCC"] & /@ {human, hus};
x = TimeSeriesMap[First, humMFCC]["Values"]; nX = Length[x];
y = TimeSeriesMap[First, husMFCC]["Values"]; nY = Length[y];

{n, m} = WarpingCorrespondence[x, y];
ListLinePlot[{x[[n]], y[[m]]}, PlotStyle -> {Blue, Green}]

enter image description here

partida

Posted 2016-10-30T13:28:05.147

Reputation: 6 452

But it seems in 11.2.0 the result is changed... – partida – 2017-12-15T02:11:33.630