## Ηow to create an interpolated CDF from its samples?

9

7

I want to use a distribution I have only aggregate statistics on, namely its CDF sampled at certain points. I would like to keep it "nonparametric" (remain noncommittal on the parametric form), but I need some interpolation over these points. Also, I would like to use a PDF of the distribution. So I would need an interpolation that is monotone and differentiable over a list. What is there to do?

(FWIW: With Interpolation[], only a first-order interpolation is monotone, but then there are non-differentiable breakpoints in the piecewise linear function.)

Here is a list I would interpolate over:

EDIT: This is the US taxable income distribution for 2009 which I manually extended with two endpoints to help it look like a CDF. This latter point might be silly, let me know how you'd do better.

{{0, 0}, {5000., 0.00373625}, {10000., 0.0269361}, {15000., 0.0621601}, {20000., 0.121612},
{25000., 0.178247}, {30000., 0.234482}, {40000., 0.3516}, {50000., 0.453972},
{75000., 0.654898}, {100000., 0.78906}, {200000., 0.952381}, {500000., 0.991166},
{1.*10^6, 0.997134}, {1.5*10^6, 0.998442}, {2.*10^6, 0.998977}, {5.*10^6, 0.999727},
{1.*10^7, 0.9999}, {10000000000, 1}}


Do you have an analytic model or guess for that function? – Vitaliy Kaurov – 2012-05-22T22:17:33.273

No, I did not want to go down that route. Then I know I could Fit the functional form on this list. – László – 2012-05-22T22:22:19.473

FWIW: This is the US income distribution. There are some guesses what might be a nice analytical form, like a mixture of "something" and a Pareto tail, but I hoped I could avoid that. – László – 2012-05-22T22:23:15.370

2Your last point seems like an outlier. This are mutual neighbor divisions of horizontal coordinate: {ComplexInfinity, 2., 1.5, 1.33333, 1.25, 1.2, 1.33333, 1.25, 1.5, 1.33333, 2., 2.5, 2., 1.5, 1.33333, 2.5, 2., 1000.} They all seem comparable, besides the last point. – Vitaliy Kaurov – 2012-05-22T22:40:03.927

3

I think monotone cubic spline interpolation will do the trick.

– azdahak – 2012-05-23T00:39:33.580

@azdahak: Sounds about right. But to be clear: Interpolation[..., InterpolationOrder -> 3, Method -> "Hermite"] is not doing this. – László – 2012-05-23T00:49:50.453

No, that is just normal cubic spline interpolation. The method in the wikipedia article will have to be implemented since I don't believe Interpolation[] has this option. – azdahak – 2012-05-23T00:52:33.360

Here is something from Stan Wagon. – azdahak – 2012-05-23T01:11:44.430

Does it really need to go through the points or are they noisy estimates? – Emre – 2012-05-23T19:08:19.543

@Emre: What would you suggest if the curve does not need to go through the points? – László – 2012-05-24T02:26:36.997

12

The code in Heike's answer is a bit long, but only because it does not exploit the fact that piecewise Hermite interpolation is already supported by Mathematica as Interpolation[]. Thus, here is a shorter way to implement Fritsch-Carlson monotonic cubic interpolation:

fcint[data_] := Module[{del, slopes, tau},
del = #2/#1 & @@@ Differences[data];
slopes = Flatten[{del[], MovingAverage[del, 2], del[[-1]]}];
tau = MapThread[Min, Through[{Append, Prepend}[Min[#, 1] & /@
(3 del/(Norm /@ Partition[slopes, 2, 1])), 1]]];
Interpolation[Transpose[{List /@ data[[All, 1]], data[[All, -1]],
slopes tau}], InterpolationOrder -> 3, Method -> "Hermite"]]


Try it out:

data = {{0, 0}, {5000., 0.00373625}, {10000., 0.0269361}, {15000., 0.0621601},
{20000., 0.121612}, {25000., 0.178247}, {30000., 0.234482}, {40000., 0.3516},
{50000., 0.453972}, {75000., 0.654898}, {100000., 0.78906}, {200000., 0.952381},
{500000., 0.991166}, {1.*10^6, 0.997134}, {1.5*10^6, 0.998442}, {2.*10^6, 0.998977},
{5.*10^6, 0.999727}, {1.*10^7, 0.9999}, {10000000000, 1}};

fun = fcint[data];

Plot[fun[x], {x, 0, 10^6}, Axes -> None,
Epilog -> {Red, AbsolutePointSize, Point[data]}, Frame -> True, PlotRange -> All] I actually have implemented various methods for producing piecewise monotonic interpolants in Mathematica, but I don't think this is the right place to list them all. (Maybe for another time and question...) In the meantime, however, I still want to show that the method presented above is not the only (much less the best) method for doing monotonic interpolation. For instance, here is another method, also due to Fritsch and Carlson:

PCHIPEnds[h1_, h2_, d1_, d2_] := With[{d = ((2 h1 + h2) d1 - h1 d2)/(h1 + h2)},
Which[Sign[d] != Sign[d1], 0,
Sign[d1] != Sign[d2] && Abs[d] > 3 Abs[d1], 3 d1,
True, d]]

PCHIPInterpolation[data_] := Module[{dTrans = Transpose[data], del, h},
h = Differences[First[dTrans]]; del = Differences[Last[dTrans]]/h;
Interpolation[Transpose[{List /@ First[dTrans], Last[dTrans], Flatten[
{PCHIPEnds[h[], h[], del[], del[]],
If[Equal @@ Sign[Last[#]] && And @@ Thread[Last[#] != 0],
3 Total[First[#]]/Total[({{1, 2}, {2, 1}}.First[#])/Last[#]], 0] & /@
Transpose[Partition[#, 2, 1] & /@ {h, del}],
PCHIPEnds[h[[-1]], h[[-2]], del[[-1]], del[[-2]]]}]}],
InterpolationOrder -> 3, Method -> "Hermite"]]


This is in fact the same algorithm behind the MATLAB function pchip().

Try it out:

fun = PCHIPInterpolation[data];

Plot[fun[x], {x, 0, 10^6}, Axes -> None,
Epilog -> {Red, AbsolutePointSize, Point[data]}, Frame -> True, PlotRange -> All] As an addendum: the Fritsch-Carlson paper notes that the interpolant from Akima's (1970) interpolation method (which can also be recast as a piecewise cubic Hermite interpolant) is not guaranteed to preserve the monotonicity of the data (thus, I don't recommend its use for constructing an interpolated CDF). If, even with this caveat, you still want to use the Akima interpolant, here's a short routine for producing it:

AkimaInterpolation[data_] := Module[{dy},
dy = #2/#1 & @@@ Differences[data];
Interpolation[Transpose[{List /@ data[[All, 1]], data[[All, -1]],
With[{wp = Abs[#4 - #3], wm = Abs[#2 - #1]},
If[wp + wm == 0, (#2 + #3)/2, (wp #2 + wm #3)/(wp + wm)]] & @@@
Partition[Join[{{3, -2}, {2, -1}}.Take[dy, 2], dy,
{{-1, 2}, {-2, 3}}.Take[dy, -2]], 4, 1]}],
InterpolationOrder -> 3, Method -> "Hermite"]]


(There is a newer (1991) cubic interpolant also due to Akima, but the algorithm for producing it is a bit more involved. It, too, does not preserve monotonicity.)

Try it out:

fun = AkimaInterpolation[data];

Plot[fun[x], {x, 0, 10^6}, Axes -> None,
Epilog -> {Red, AbsolutePointSize, Point[data]}, Frame -> True, PlotRange -> All] 10

interp[pts_] := Module[{delta, mlst, zeropos, tau, h00, h01, h10, h11},
delta = #2/#1 & @@@ Differences[pts];
mlst = Flatten[{delta[], MovingAverage[delta, 2], delta[[-1]]}];
tau = Min[#, 1] & /@ (3 delta/Sqrt[(Most[mlst]^2 + Rest[mlst]^2)]);
mlst = mlst tau;
h01[t_] := -2 t^3 + 3 t^2;
h00[t_] := h01[1 - t];
h11[t_] := t^3 - t^2;
h10[t_] := -h11[1 - t];
Function[{x},
Function[{plow, phigh, mlow, mhigh},
Module[{diff, t},
diff = phigh[] - plow[];
t = (x - plow[])/diff;
{Expand[plow[] h00[t] + diff mlow h10[t] + phigh[] h01[t] +
diff mhigh h11[t]], plow[] <= x <= phigh[]}]],
{Most[pts], Rest[pts], Most[mlst], Rest[mlst]}]]]];

fun = interp[pts];

Plot[fun[x], {x, 0, 10^6}, Exclusions -> None, PlotRange -> All, Epilog -> Point[pts]] 1On version 7 mlst = Join[{delta[]}, (Most@delta + Rest@delta)/2, {delta[[-1]]}] is faster; can you confirm for v8? Also, consider tau = MapThread[Min, {{##, 1}, {1, ##}}] & @@ tau – Mr.Wizard – 2012-05-23T15:56:48.243

@Heike, This is impressive, but on a more detailed list (113 elements) I got gaps in the interpolation. What could cause that? Shame I cannot decipher on my own. – László – 2012-05-23T17:50:52.363

7

Update: The link in the late Jens-Peer Kuska's MathGroup post is no longer working, and it seems there are no other locations on the web to download the package from. So, here I post the contents of the package with gratitute to Jens-Peer Kuska for his continuing service to the Mathematica community.

## Nonparametric Splines package - Jens-Peer Kuska

 BeginPackage["NonParametricSplines"]

CubicSplineInterpolation::usage="CubicSplineInterpolation[x,y] compute a cubic spline interpolation function. It return a CubicSpline[] function, the function should have smooth first and second derivatives.."
CubicSpline::usage="CubicSpline[] represent a interpolation function returned by    CubicSplineInterpolation[]."

AkimaSplineInterpolation::usage="AkimaSplineInterpolation[x,y] compute a Akima spline interpolation. It return a AkimaSpline[] function with smooth first derivative."
AkimaSpline::usage="AkimaSpline[] represent a interpolation function returned by AkimaSplineInterpolation[].."

ExtractSplineData::usage="ExtractSplineData[spline_] return the list of the original {x,y} pairs."

Begin["Private"]

ExtractSplineData[CubicSpline[x_,y_,__]]:=Transpose[{x,y}]
ExtractSplineData[AkimaSpline[x_,y_,__]]:=Transpose[{x,y}]

SplineDataRange[CubicSpline[x_,__]]:={Min[#],Max[#]} &[x]
SplineDataRange[AkimaSpline[x_,__]]:={Min[#],Max[#]} &[x]

CubicSplineInterpolation[x_, y_, yp1_, ypn_] :=
Module[{n, u, d2y, i, sig, p, qn, un},
n = Length[y];
u = Table[0, {n}];
d2y = u;
If[NumericQ[yp1],
d2y[] = 0.5;
u[] = (3.0/(x[] - x[]))*((y[] - y[])/(x[] - x[]) - yp1);
];
Do[
sig = (x[[i]] - x[[i - 1]])/(x[[i + 1]] - x[[i - 1]]);
p = sig*d2y[[i - 1]] + 2.0;
d2y[[i]] = (sig - 1.0)/p;
u[[i]] = (y[[i + 1]] - y[[i]])/(x[[i + 1]] - x[[i]]) - (y[[i]] -  y[[i - 1]])/(x[[i]] - x[[i - 1]]);
u[[i]] = (6.0*u[[i]]/(x[[i + 1]] - x[[i - 1]]) - sig*u[[i - 1]])/p, {i, 2, n - 1} ];
If[NumericQ[ypn],  qn = 0.5;  un = (3.0/(x[[n]] -
x[[n - 1]]))*(ypn - (y[[n]] - y[[n - 1]])/(x[[n]] -
x[[n - 1]])),
un = qn = 0.0 ];
d2y[[n]] = (un - qn*u[[n - 1]])/(qn*d2y[[n - 1]] + 1.0);
Do[d2y[[i]] = d2y[[i]]*d2y[[i + 1]] + u[[i]],  {i, n - 1, 1, -1}];
CubicSpline[x, y, d2y]
]

CubicSplineInterpolation[xy : {{_?NumericQ, _?NumericQ} ..}, yp1_, ypn_] :=
CubicSplineInterpolation[Sequence @@ Transpose[xy], yp1, ypn]

CubicSplineInterpolation[xy : {{_?NumericQ, _?NumericQ} ..}] :=
CubicSplineInterpolation[Sequence @@ Transpose[xy], Automatic, Automatic]

spline = Compile[{{x, _Real, 1}, {y, _Real, 1}, {d2y, _Real, 1}, {t, _Real}},
Module[{a, b, h, low = 1, hi = Length[x], j},
While[hi - low > 1,  j = Quotient[hi + low, 2];
If[x[[j]] > t, hi = j, low = j]   ];
h = x[[hi]] - x[[low]];
a = (x[[hi]] - t)/h;
b = 1 - a;
a*y[[low]] + b*y[[hi]] + ((a^3 - a)*d2y[[low]] + (b^3 - b)*d2y[[low]])*h^2/6
]];

dspline = Compile[{{x, _Real, 1}, {y, _Real, 1}, {d2y, _Real, 1}, {t, _Real}},
Module[{a, b, h, low = 1, hi = Length[x], j},
While[hi - low > 1,
j = Quotient[hi + low, 2];
If[x[[j]] > t, hi = j, low = j]];
h = x[[hi]] - x[[low]];
a = (x[[hi]] - t)/h;
b = 1 - a;
(y[[hi]] - y[[low]])/h +h*((3*b^2 - 1)*d2y[[hi]] - (3*a^2 - 1)*d2y[[low]])/6
]];

ddspline =  Compile[{{x, _Real, 1}, {y, _Real, 1}, {d2y, _Real, 1}, {t, _Real}},
Module[{a, b, h, low = 1, hi = Length[x], j},
While[hi - low > 1, j = Quotient[hi + low, 2];
If[x[[j]] > t, hi = j, low = j] ];
h = x[[hi]] - x[[low]];
a = (x[[hi]] - t)/h;
b = 1 - a;
b*d2y[[hi]] + a*d2y[[low]] ]];

CubicSpline[x_, y_, d2y_][t_?NumericQ] := spline[x, y, d2y, t]
CubicSpline[x_, y_, d2y_]'[t_?NumericQ] := dspline[x, y, d2y, t]
CubicSpline[x_, y_, d2y_]''[t_?NumericQ] := ddspline[x, y, d2y, t]
Format[CubicSpline[x_, _, _]] := CubicSpline[{First[x], Last[x]}, "<>"]

AkimaSplineInterpolation[x_List, y_List, yp1_, ypn_] /; Length[x] == Length[y] :=
Module[{i, n, m, t, tmp1, tmp2},
n = Length[x];
m = Table[0, {n + 4}];
Do[ m[[i + 2]] = (y[[i + 1]] - y[[i]])/(x[[i + 1]] - x[[i]]), {i, 1, n - 1}];
If[NumericQ[yp1],  m[] = m[] = yp1,   m[] = m[] = m[]  ];
If[NumericQ[ypn],   m[[n + 3]] = m[[n + 4]] = ypn,
m[[n + 3]] = m[[n + 4]] = m[[n - 1]]  ];
m=N[m];
t = Table[0, {n}];
Do[ tmp1 = Abs[m[[i + 3]] - m[[i + 2]]];
tmp2 = Abs[m[[i + 1]] - m[[i]]];
t[[i]] = If[tmp1 + tmp2 > 0, (* Then *)
(tmp1*m[[i + 1]] + tmp2*m[[i + 2]])/ (tmp1 + tmp2),
(* Else *)   (m[[i + 1]] + m[[i + 2]])/2 ], {i, 1, n}];
AkimaSpline[x, y, t]
]

AkimaSplineInterpolation[xy : {{_?NumericQ, _?NumericQ} ..}, yp1_, ypn_] :=
AkimaSplineInterpolation[Sequence @@ Transpose[xy], yp1, ypn]

AkimaSplineInterpolation[xy : {{_?NumericQ, _?NumericQ} ..}] :=
AkimaSplineInterpolation[Sequence @@ Transpose[xy], Automatic, Automatic]

aspline =  Compile[{{x, _Real, 1}, {y, _Real, 1}, {ta, _Real, 1}, {t, _Real}},
Module[{a, b, c, d, h, del, tmp, low = 1, hi = Length[x], j},
While[hi - low > 1,  j = Quotient[hi + low, 2];
If[x[[j]] > t, hi = j, low = j]  ];
h = 1.0/(x[[hi]] - x[[low]]);
a = y[[low]];
b = ta[[low]];
tmp = (y[[hi]] - y[[low]])*h;
c = (3.0*tmp - 2.0*ta[[low]] - ta[[hi]])*h;
d = (ta[[low]] + ta[[hi]] - 2.0*tmp)*h*h;
del = t - x[[low]];
a + del*(b + del*(c + del*d)) ]];

daspline =   Compile[{{x, _Real, 1}, {y, _Real, 1}, {ta, _Real, 1}, {t, _Real}},
Module[{b, c, d, h, del, tmp, low = 1, hi = Length[x], j},
While[hi - low > 1,
j = Quotient[hi + low, 2];
If[x[[j]] > t, hi = j, low = j]  ];
h = 1.0/(x[[hi]] - x[[low]]);
b = ta[[low]];
tmp = (y[[hi]] - y[[low]])*h;
c = (3.0*tmp - 2.0*ta[[low]] - ta[[hi]])*h;
d = (ta[[low]] + ta[[hi]] - 2.0*tmp)*h*h;
del = t - x[[low]];
b + (2.0* c + 3.0* d *del) *del  ]];

AkimaSpline[x_, y_, ta_][t_?NumericQ] := aspline[x, y, ta, t]
AkimaSpline[x_, y_, ta_]'[t_?NumericQ] := daspline[x, y, ta, t]

Format[AkimaSpline[x_, _, _]] := AkimaSpline[{First[x], Last[x]}, "<>"]

End[]
EndPackage[]


AkimaSplineInterpolation in the NonParametricSplines package seems to be a good fit for your needs.

EDIT: The package does not come with the Mathematica installation; it was posted in 2008 in MathGroup by the late Jens-Peer Kuska. The link in the MathGroup post for Kuska's zipped files is still active. Needs["NonParametricSplines"]
dataset = {{0, 0}, {5000., 0.00373625}, {10000., 0.0269361}, {15000.,  0.0621601},
{20000., 0.121612}, {25000., 0.178247}, {30000., 0.234482}, {40000., 0.3516},
{50000., 0.453972}, {75000.,  0.654898}, {100000., 0.78906}, {200000., 0.952381},
{500000., 0.991166}, {1.*10^6, 0.997134}, {1.5*10^6, 0.998442},
{2.*10^6, 0.998977}, {5.*10^6, 0.999727}, {1.*10^7, 0.9999}, {10000000000, 1}};
asi = AkimaSplineInterpolation[dataset]
(* ==> AkimaSpline[{0,10000000000},"<>"] *)
Manipulate[Show[
ListPlot[dataset[[;; i]], PlotMarkers -> {Automatic, Medium}, Frame -> True],
Plot[asi[x], {x, 0, 10000000}, Frame -> True, PlotStyle -> {Red, Thick}, PlotRange -> All]],
{{i, 10, "i"}, 10, 19, 1}]


Screenshot: This package isn't included in my version of Mathematica – Heike – 2012-05-23T08:30:26.007

Neither in mine. 8.0.4. What version do you use? – Szabolcs – 2012-05-23T08:38:17.877

@Heike, I am using v8.0.4.0 (for Windows). Just noticed that the link in my post (copied from package docs and used to work half hour ago) is now redirected to Wolfram's Oops page; and searching for NonParametricSplines, AkimaSpline etc does not give any results. – kglr – 2012-05-23T08:56:12.373

@Szabolcs & Heike, ... I don't remember installing this add-on at all - I am pretty sure it came with the standard installation. BTW, although the package seems to have been around since version 6, my V7 installation does not have this add-on. – kglr – 2012-05-23T09:01:37.577

@kguler I've found the package via a post on Mathgroup: http://forums.wolfram.com/mathgroup/archive/2011/Jun/msg00011.html

– Heike – 2012-05-23T09:18:03.890

This does not work on my 8.0.4.0 either. I never heard of this spline before. The link is indeed broken. – PlatoManiac – 2012-05-23T09:20:33.597

Thank you @Heike for identifying the source of the package. Yet another reminder how much/little I should trust my memory:) I will update the post with the link. – kglr – 2012-05-23T09:25:04.053

@kguler et al.: Thanks, this is great. But before I could accept this, could let me know why I see the drop at the second largest value back to zero here: Join[SOIlist, Saezlist, bounds]={{5000., 0.00373625}, {10000., 0.0269361}, {15000., 0.0621601}, {20000., 0.121612}, {25000., 0.178247}, {30000., 0.234482}, {40000., 0.3516}, {50000., 0.453972}, {75000., 0.654898}, {108024., 0.9}, {150400., 0.95}, {352055., 0.99}, {521246., 0.995}, {1.49218*10^6, 0.999},{7.89031*10^6,0.9999}, {0, 0}, {10000000000, 1}} ecdf = AkimaSplineInterpolation[Join[SOIlist, Saezlist, bounds], 0, 0] – László – 2012-05-23T15:11:10.127

And there are discontinuities in the derivative too, thus the interpolation is not smooth. Why? – László – 2012-05-23T15:13:57.980

@Laszlo, I need to check the issues you mention. This is the first time that I use this package. Although the package is quite nice and clean so that you can modify any part to implement your own requirements. My preference would be to go with Heike's solution. – kglr – 2012-05-23T18:39:37.887

@kguler: Thanks! Actually, I got gaps with Heike's solution too, for some reason. – László – 2012-05-23T18:43:50.753

1The link seems to be kaput for me. Would you happen to know another place where I can download Kuska's package? – J. M.'s ennui – 2012-10-11T14:52:02.410

@J.M. that's the only link I know of and it does not work any longer. Google does not return any leads either. – kglr – 2012-10-11T15:03:10.707

Do you have a copy of the file yourself? – Mr.Wizard – 2012-10-11T19:48:48.317

@Mr.W, yes I do. Is it possible/advisable to make the zip file available on MMA.SE? – kglr – 2012-10-11T19:54:47.330

I cannot advise you on that, but I'm sure some people would be happy if you do. How big is the file? – Mr.Wizard – 2012-10-11T19:55:53.493

Mr.W: Not too big -- zip file is 152KB. – kglr – 2012-10-11T20:01:41.213