## Statistical distances for time series of distributions

5

3

I am interested in clustering $N$ time series of $T$ 'values' each. These values are distributions (which can be represented by their cumulative distribution functions (cdf), or their probability density functions (pdf), or more convenient forms such as square-root pdfs yielding a simple spheric geometry).

For comparing given distributions, there is an extensive literature on statistical distances (KL, Hellinger, Wasserstein, and so on), but for comparing given time series of distributions, I am not sure whether there is any literature at all?

Such distances should somehow take into account dynamics information besides the distribution proximity at time t. Ideally, I wish I could have a kind of information factorization similar to this result.

I am wondering if such distances already exist and whether this kind of problem has already been formulated in the literature?

Thanks for your answer, but dynamic time warping does not suit to my need. This dp technique only captures a rough similarity of shapes by allowing non-linear time distortion. But, it does not amount for the whole information in these time series, e.g. what about the distribution of distortions? Do the distributions of a given time series vary smoothly through time or violently? DTW is not always the solution, for instance, when working with random walks, it does not make sense to use a DTW since there are no time patterns! In this case, the only information is "correlation" and "distribution" (cf. Sklar's theorem in Copula Theory), and the paper cited above.

-- edit 2 Here are the papers that are somehow related to my question:

Check my answer on DTW clustering of time series.

– Aleksandr Blekh – 2015-06-12T07:59:24.580

Basically, I have a time series of time series. Let's assume that I use at time $t$ a DTW (but I would rather use $\phi = \arccos \langle p,q \rangle$), how to extend it to the whole time series? This is really my point. – mic – 2015-06-12T08:36:28.710

You're welcome. My advice does not imply that I think that TDW is universal solution. I just thought that papers, referenced in my linked answer, potentially might contain some ideas, useful to your case. I don't have an answer for your "time series of time series" case. As for analyzing distortions, you could consider applying time series anomaly detection and analysis approaches. – Aleksandr Blekh – 2015-06-12T09:05:50.133

Have you tried mutual information? – Alexandru Daia – 2015-06-12T06:05:08.283

For measuring dependency between variables, I prefer using copulae, [though mutual information and copula are very much the same]{http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6077935%7D%21 Yet, in my case, dependency is not the only information I care about / in this kind of time series. In fact, I wish I could obtain [a result similar to this one]{http://arxiv.org/pdf/1506.00976v1.pdf%7D.

– mic – 2015-06-12T06:33:56.677

Here's an idea way-out-of-left field - a time-series of pdfs can be thought of as a solution to a Fokker-Planck-type PDE (yes/no/maybe?). Would it be feasible to fit such a PDE to your samples and then cluster the PDE's coefficients? – alexandre iolov – 2015-11-19T08:24:35.733

4

This is similar to the fundamental information theory problem that Shannon explored. In that domain, it is framed this way: given two rvs, X and Y, what information does X convey about Y?

An example would be that I create sequence of numbers/bit/letters from a known PDF (X) and you receive a distorted version of those values for which you also know the PDF (Y). The mutual information is the number of bits that Y communicates about X which can be thought of as a type of correlation.

In the case of time varying PDFs (ie stochastic processes), information theory would just treat the ensemble of rvs as one joint rv and then calculate the mutual information of the joint PDFs. If the PDFs are iid, then significant simplifications are possible. Joint Gaussian PDFs also make things easier.

Another information theory concept that might be useful is the entropy rate of a process. ER is a quantification of the amount of information contained in a process. Depending on your problem, you might be able to compute the ER for each process and then use those values as realizations of a feature that can be grouped with a clustering algorithm.

if mutual information is a type of correlation, what can be done to make mutual information distinguish between negative correlation events and positive correlation events, given that mutual information is bounded between $(0,\infty)$ whereas correlation spans $(-1,1)$ – develarist – 2020-08-28T22:48:53.957

Quick aside: MI is upper bounded by the lowest of the entropies of X or Y. You can't convey more information about an rv than the entropy of the rv itself. So I(X;Y) <= min(H(X), H(Y)). To answer your main question, it is not clear to me how you can distinguish negative and positive correlations through measures like MI. – Bob Baxley – 2020-08-30T13:43:19.093

0

Why would you need a distance when you have the pdf?

Time series can be assigned to clusters based on their fit to the cluster's pdf. This also means the method works for ragged timeseries, for which distance methods would break down.

By "Time series can be assigned to clusters based on their fit to the cluster's pdf. " you mean the mixture modelling EM-based clustering? I am not sure to understand your point. Basically, pdfs are my object of study, i.e. my dataset consists in $N \times T$ pdfs. Any of these N series can be viewed as a pdf which is evolving "smoothly" through time, and we have T regularly spaced snapshot of it. So, I want to capture its distorition dynamics AND I want to capture how it relates to other pdfs time series both in pdfs similarity and dynamics. – mic – 2015-06-17T11:41:45.167

@mic think my reading of your post oversimplified the question. Do we have any simplifying assumptions, e.g. could the distribution at time $t$ have a parametric form? – conjectures – 2015-06-18T14:53:52.793

No. But, approximation is allowed if it can lead to a reasonable solution. – mic – 2015-06-18T15:40:25.380

1@mic what are your samples? Do you actually have the distributions at all times for each series or do you have a sample(s) from the distribution at time $t$ in series $n$? – conjectures – 2015-06-18T15:59:59.043