What completeness index to use for your genealogical tree?

I am currently working on a lot of ancestries, with some of more than 13 generations. Obviously, I don't have all the ancestors for each ancestry. You theoretically expect an exponential growth of the number of ancestors from the starting individual, but it is hardly the case, due to inbreeding (consanguinity) or just because you lack information on some of the ancestors.

It can result in very incomplete tree, or very unbalanced one.

So, I need some general indicator of completeness for a genealogical tree. I've come up with the completeness of a tree which is :

N : Number of individual in the genealogy, + the people that should have been there because of inbreeding (If there is one consanguinity event, you basically have one less people than expected at one level, but it add up when you go back in time. Nevertheless,you have information on the ancestry, so I count those people many times.

: The theoretical number given by where n is the maximum depth of the tree.

Do you guys have any idea of other useful statistics (balanced of the tree, mean depth of the tree...)? I know its more of a statistical genealogy question, but perhaps there is some folks around here who might have encounter this issue.

3Most genealogists wouldn't correlate your completeness index with quality because genealogists care about data quality, not the shape of the tree. If you mean "quality of the tree structure" then you should probably ask your question on a programming or math oriented site. – None – 2013-05-24T14:36:33.180

Most would include their own descendants, not just their ancestors, in a "complete" tree. For me, completeness includes cousins (other descendants of my ancestors). The total of these are not readily quantifiable, given all the variables of family size, child mortality, age of the researcher, etc. – bgwiehle – 2013-05-24T20:33:49.227

I think this should be a duplicate of (and quite likely another answer to) How well do I know my ancestors (at least superficially)?.

@PolyGeo, I agree it ought to be an answer to that question rather than a question in its own right -- I'll watch with interest how much support there is to mark it as a duplicate. – None – 2013-05-26T11:43:57.007

@JustinY I agree that perhaps I should ask it in a math oriented site. But I wanted some input from genealogist, since my data are real genealogy. – ElCascador – 2013-06-10T13:42:09.707

I would use a single number that would represent an average depth of completion, calculated by adding the percentages complete at each generation.

Like this with an example:

Parents, Know 2 out of 2 = 100% = 1.000
Grandparents, Know 4 out of 4 = 100% = 1.0000
Great-grandparents, Know 6 out of 8 = 75% = 0.750
Great-great-grandparents, Know 4 out of 16 = 25% = 0.250
Great^3-grandparents, Know 2 out of 32 = 6.25% = 0.062
Great^4-grandparents, Know 1 out of 64 = 1.5625% = 0.016


Then your average depth of completion (maybe call it your ADC) would be 1 + 1 + .75 + .25 + .062 + .016 = 3.078

If you happen to have pedigree collapse where, for example, 2 great-grandparents are the same person, then I'd keep it simple and just count that person as 2 known out of the 8 rather than complicate and make it 1 out of 7. If you find out that person's 2 parents, they'll count as 4 out of the 16.

This measure is good because it rewards you more for filling in your holes and rewards you less for a single line that goes way back.

1This is basically the area of one of those fan charts that FamilySearch and some other software (e.g. MacFamilyTree) offer, so it has a nice geometric interpretation. – Verbeia – 2013-05-30T02:18:14.767

Hi, I am currently doing this, it seems very informative on the nature of the tree. A sudden drop of completeness early in the depth is most of the time a bad thing for me. – ElCascador – 2013-06-10T13:43:02.947

While this appears an attractive option on the surface, consider the impact on "completeness" of locating information on three previously-unknown 4Xgreat grandparents. In recognition of an achievement of earth-shattering significance to any family historian, your index would increase from 3.078 to 3.126! Appropriate inter-generational weighting is essential in producing a number that has real meaning. – Fortiter – 2013-06-11T04:11:33.367

@Fortiter - If we're talking about "completeness", then researching one line far back does not help you complete much. But if you want a "recognition of achievement" index, then that's a completely different thing and you'll want to include the depth of the success in that type of measure. – lkessler – 2013-06-11T16:16:28.823

My family tree can never be complete in the sense of including all the possible individual people, much less all the possible information about each person. I leave the notion of gotta catch 'em all to my descendents who are into Pokemon. So it makes no sense for me to try to measure my distance away from an unattainable goal.

In my work, I prefer to think about progress rather than completeness. Am I actually achieving anything worthwhile or simply wasting the time and money I devote to family history. My wife would insist that I count the opportunity cost of time not spent on tasks that she regards as being more important.

My first measure of progress is to ask "Do I know something new as a result of what I have just done?" That something new does not need to be a new twig on the tree or even a confirmed date for someone I already knew. There are many days when progress is measured in terms of negative evidence (That John Davies from Bangor cannot be my great-grandfather!) or another question (Do I need to look into chapel records rather than the established church?).

Perhaps closer to what you are seeking is progress on a particular project. If I have set myself the task of identifying three generations of descendents of Heinrich Cramer, then I have a better (but not certain) idea of what constitutes finishing that task. Only then can I begin to create functions of the number of people I am seeking and the number I have identified through documented sources. In response to another question I have discussed the relative advantages and disadvantages of possible algorithms. The tl;dr version is that whether any measure is good or bad is determined principally by what you think is important. I can make progress on your tree look as good or as bad as you want by manipulating the formulae; but what is the point?

Of course, some people are very motivated by a simple linear scale showing movement towards a defined goal. In that case, set a number of generations back that you will regard as the end-point of your search (represented by n in the denominator of your algoritm) and use the simple ratio you suggest.

If it works for you, then it is an appropriate tool. Just don't expect everyone else in the world(s) of Genealogy and Family History to agree.

Thank you for your input. But the fact is that the real genealogical work has been already done with my data. I can't find any more people to include in it. Now I'm just trying to cope with what I have. That's why I need some sort of quality assessment. Thank you anyway. – ElCascador – 2013-06-10T13:46:14.993

@ElCascador Congratulations on "finishing" your tree. So many of the rest of us believe that we will never reach that point. – Fortiter – 2013-06-10T13:56:16.523

I would like to divorce out the aspect of the quality of the tree. Some would equate quantity with lack of quality which is not necessarily the case. I believe this index is useful with any level of quality.

I also wrestle with the 'completeness' index asked about. For my own research, I've decided to take my ancestors back to when they immigrated to the United States so mine is a more bounded problem. I just use the percent of 2 to the fourteenth that I have covered. I have a large circular fan chart and I just consider a sector 'covered' if the line termination is an immigrant. For duplicate ancestors, I just include them in both places. I have python programs that fill in the chart and calculate the 'completeness' index. I'm currently at 55%. I also calculate it per generation.