## How well do I know my ancestors (at least superficially)?

7

2

What I am hoping to learn in this Question is the correct terminology and any tools available for determining what I will try to describe below.

Firstly, I know that I will never know all my ancestors because all known ancestors have ancestors who may be unknown.

However, if I know my 2 parents, 4 grandparents, 8 great-grandparents then I would say I know 100% of them to three generations back. By "know" let's assume just knowing their name qualifies as knowing them.

If, I then know 3 of my 16 great-great-grandparents and none of my 3 greats-grandparents and beyond, then I think I could score my known ancestors by adding up like this:

• 50% because I know both parents
• 25% because I know all grandparents
• 12.5% because I know all great-grandparents
• 3/16 of 6.25% because I know only 3 of my 16 great-great-grandparents (fortunately I actually know all of them and lots beyond)

This gives me a score of 50 + 25 + 12.5 + ((3*6.25)/16) = 88.67%

As you can see this score will asymptote towards 100% but can never reach it because each successive generation that is revealed contributes only half the percentage of the previous one.

I guess my question is: "Does anyone know a name for what I describe above, and do any genealogy packages provide the calculation of this percentage as a function?"

(I use Family Tree Maker and Ancestry.com so that is where I have looked without success so far)

UPDATE

In response to some answers/comments about the over-weighting of parents and other recent ancestors in this measure, my thinking is that the weighting given to them may actually be appropriate. My reasoning is that when a parent is unknown you lose the chance to know half of your ancestors. Similarly, not knowing a grandparent knocks out 25%. I think it may actually be a strength not a weakness - we just need to be clear about what the measurement tells us.

This could also be asked on Mathematics SE – American Luke – 2012-11-15T00:21:19.050

2What's the usage of such a figure? I've never really thought about such a calculation so I'm not sure what it would represent. Is it trying to determine some level of knowledge of your gene make-up? That would eliminate many significant environmental factors, or the possibility that you were not raised by your genetic parents for some reason, but these are historical issues that would have helped make you who you are now. – ACProctor – 2012-11-15T11:54:21.713

1You might consider the word 'ancestry' instead of 'pedigree' in your question. Some people are turned off by the word pedigree when referring to human ancestry because of the 'select breeding' aspect of the definition in common usage associated with animals. It is a perfectly valid genealogical term (and doesn't imply 'better') but ancestry avoids the issue entirely. – Duncan – 2012-11-15T13:14:50.503

@Duncan - you make a valid point so I have removed the word pedigree from the question – PolyGeo – 2012-11-15T22:23:05.550

@ACProctor For me the usage is just one of curiosity although I imagine, in theory, Family Tree software could not just calculate the number today but graph how it has changed over time since a person begins entering data to assist their quest to know their ancestors. – PolyGeo – 2012-11-15T23:18:37.287

10

What you are doing is finding the sum of a converging geometric series.

I doubt that you will find a genealogical name for this piece of mathematics, because (in my humble opinion) it is not a good model for what you are trying to represent.

When I read Crista's blog about ancestral numbers (referenced in GeneJ's answer), I thought it was bit of fun but limited by the fact that each measure of completeness applies to a specified number of generations. That is great for setting a target (such as "I am 64% of the way to completing 7 generations") but once you include a new generation in your measure then the value crashes (from say 98% to 49%) because each extra generation back adds as many people as all those generations that came after.

Your suggestion is better because the same measure can be applied to your work at all stages, regardless of how many generations you consider. (As you say, its value approaches but never reaches 1, or 100%). On the other hand, the sum of a geometric series is a less useful model because it places too much weight on completing recent generations.

These two figures represent the extent of completeness using Crista's equi-weighted ancestors (left) and your proposal (right).

Imagine a beginner sitting down with a blank ancestor chart and told to write in his parent's details. Could he then claim that his pedigree was half done because his PGCI (PolyGeo Completeness Index) was now at 0.5?

It is a great idea to reduce the impact on your calculation of the hordes of distant ancestors that you are least likely to find, but not at the cost of over-weighting the low-hanging fruit.

There is a measure of Pedigree Completeness (the proportion of known pedigree information for an arbitrary number of generations) used by animal breeders and included in software such as Pypedal. It is computed as:.

When you recognise that ak is the number of known ancestors and g is the selected number of generations, then cp is precisely the same as Crista Cowan's number (but it does look more mathematical!)

I strongly encourage you to continue explore this idea. There would be genuine interest in (if not an actual need for) a good index of completeness. Since there are flaws in your geometric sum, perhaps the answer lies in an harmonic series?

1Many thanks for your research and encouragement of this idea. I wonder whether I can deal with the over-weighted low-hanging fruit by re-branding the index from one of completeness to one of perhaps "tree progress" where it is simply recognized that getting to 50% is easy, 100% is impossible, but the closer your value to 100% the more you will know your ancestors. For now I think I'll just leave it to the software developers to see if they want to add a button to do the calculation - I would use it! – PolyGeo – 2012-11-15T04:50:42.437

+1 but almost seriously, we might consider migrating this to mathematics.stackexchange.com – lkessler – 2012-11-15T22:21:10.453

1I'm keen to get the idea in front of Family History software developers because I suspect it would be "relatively" easy for them to add if they see their potential are interested in it. Certainly, exploring harmonic vs geometric series would be a mathematical topic but not one that I want to get into. – PolyGeo – 2012-11-15T22:26:00.643

I wish I could have awarded the best answer to both fortiter (because of its rigour) and @fbrereto because Rank Trees sound like they may be even more useful than the line of thinking espoused in my original question. – PolyGeo – 2013-01-06T01:30:07.357

7

Fortiter's answer is amazing and started me thinking about this kind of representation. Here's a solution I'd propose using Rank Trees. (The link doesn't add much to the conversation other than to point out I didn't come up with the notion. Whether or not I'm the first to apply it to genealogy, I cannot say.)

## Definition

Here is the basic rule for rank trees:

The rank of a node in a rank tree is the sum of its child node ranks plus 1

(Here, child nodes in a rank tree amount to ancestors in a genealogical tree; I'll use the genealogical terms going forward.)

So each person in a tree has a rank from 1 to... something. Let's call it one's Ancestor Rank, or AR.

## Example

So just starting out here might be my tree:

me (1)


Once I fill in my parents, here's what happens to the tree:

       |- mother (1)
me (3) +
|- father (1)


Note that my AR has changed from 1 to 3: I know about my father and mother (who each are a point), plus 1. As more ancestors are discovered the tree is re-weighted accordingly:

                     |- grandparent (1)
|- mother (2) +
|
me (6) +
|             |- grandparent (1)
|- father (3) +
|- grandparent (1)


In the above tree I don't know a grandparent on my mother's side, so her AR is one less than my father's, for whom I have both parents.

The advantages of the above system are many, the highlights of which are:

• Integers only, keeping the math simple
• A person's AR does not change based on their position in the tree. Using the above example, if I were to evaluate the rank of my child my rank would still be 6.
• Unless someone is removed from the tree, ARs only get larger
• Every time an ancestor is added, all related descendent's ARs are increased by 1. This gives greater weight to ancestors further back than those closer to you.
• There is no upper limit to a person's AR
• Linear growth values are easier to compare than asymptotic ones.

## But what does it mean?

A person's AR is a rough estimate of how "deeply" you have identified their ancestors. (I say "deeply" because AR is a combination of how broad a person's ancestry is known as well as how far back it goes.) Because AR grows the more deeply a person's ancestry is known, it can be a way to roughly asses the completeness of an individual's genealogical past.

As a comparison, think about you GFH reputation: it is a rough estimate of how much the community respects your contribution to the site, though in reality the number isn't worth much. Nevertheless such gamification can be a powerful motivator for people to continue contributing (or investigating family history!)

## Superfluousness

Another example, for grins and the sake of ASCII trees:

                     |- grandparent (1)
|- mother (2) +
|
me (8) +                                |- g.grandparent (1)
|             |- grandparent (3) +
|- father (5) +                  |- g.grandparent (1)
|
|- grandparent (1)


2+1 Sounds like a great way to get a number more suitable for tracking tree progress than just shaving off tiny percentages like I was proposing. If someone implements rank trees in a package I am using I would certainly be keen to watch my rank grow. – PolyGeo – 2012-11-16T00:18:23.607

2Let me return the compliment that your answer has got me thinking about this is a new way. One practical value that I envisage is the use of AR values to suggest where to research next. In your final illustration, the lower AR for mother shouts out "Work to do here!" – Fortiter – 2012-11-16T02:49:54.560

1Wow. Now you got me interested. I spent ten minutes calculating mine and got 1477.I like this method. +1 – American Luke – 2012-11-17T21:47:20.473

4

Randy Seaver's Genea-musings had a Saturday Night Genealogy Fun special on this topic in August of this year, in which he referred to this as your, "Ancestral Name Number."

His challenge was inspired by Crista Cowan's "Family History All Done? What's Your Number?" on the Ancestry.com blog, 16 Aug 2012.

Randy ran his own "numbers" out both 10 generations and 15 generations. Crista ran her numbers out 10 generations.

While interesting, it's probably fair to add that this kind of tabulation doesn't assess "completed (or accurate) work product."

4

An intriguing way of approaching it, PolyGeo. As has been said, the "over-weighted low-hanging fruit" seem a big issue.

The other aspect that concerns me is "pedigree collapse" - this is where your X-grandparents on one line are n-cousins, m-times removed. Say your paternal grandparents were actually first cousins. Then you'd have 4 Gparents, 8 great-GPs but only 6 great-great GPs. (I think that's right....) I guess your formula would still be looking for the "missing" two.

I must admit I'd only ever thought in terms of counting the theoretical number of nG-GPs on each generation, reducing by what I know for certain of pedigree collapse, and then assessing that generation only for completeness.

In an update to my Question I'm hoping that I have addressed the "over-weighted low-hanging fruit". I'm thinking that "pedigree collapse" is not an issue either because the same number of "branches on the tree" still exist - its just that some people occupy more than one of them and thus can be counted once for each time they appear. – PolyGeo – 2012-11-15T22:32:52.470

+1 for getting me thinking about "pedigree collapse" - when I come across marrying cousins I'm almost relieved because it means fewer actual people to find in order to complete earlier generations :-) – PolyGeo – 2012-11-15T23:06:21.383

2

Great question in my opinion. I had the question myself a couple of years ago and I wrote my own program to calculate this including handling overlaps correctly. But did it with the twist that I am only interested in how 'complete' I was until I reached the person who immigrated to the US. I'm trying to find all my immigrant ancestors, and the ships they arrived on. Right now I'm only about half way there for immigrants and lower for ships. It's a much smaller percentage if I was asking how complete I am back to 14th generation (the farthest back I've found an US immigrant ancestor). I 'know' roughly 2K of my ancestors and 14 generations would be roughly 32K (16K at the 14th generation and 16K-1 for everyone in generations 1-13).

I use 'know' very loosely since although some of my ancestors are documented to genealogical standards, many are not. Perversely enough, it's the lines going the farthest back that have the best documentation since they are generally associated with applications to genealogical or heritage societies and were done by professionals, not me.

I find circle charts (or fan charts) the best visual representation of this. I color code mine by the immigrant country with white showing my brickwalls so it leaps out where the near-in issues are since they cut wider swaths. And circle charts are the only way I could include info that far back and still have enough pixels. Albeit my 12 generation circle chart is 4 feet across and still hard to read unless you are right up to it. Even so, it's still a hit at family reunions. The circle chart also handles the overlap (my grandparents were 3rd cousins) since people just show up twice.

+1 (but not just for saying "great question":-) My thinking on this was triggered when I found a lineage (not to genealogical standards) that led back to Edward I (King of England) as my 19th great grandfather - knowing that some links were tenuous I did not want to give too much weight but was curious to know how much royal blood might be flowing through my veins - 2**21 = 2,097,152 is the number of ancestors in that generation (many will appear many times) - I decided not to try and calculate the score/measure which is the subject of this question manually:-) – PolyGeo – 2012-11-15T22:45:40.107

2

Tamura Jones developed an interesting method to determine the average number of Nth Generation Descendants in which he gives the formula:

D = C × 2^n ÷ A × c
where
C = number of individuals in Current generation
A = number of individuals in Ancestral generation
n = number of generations in between
D = average number of Descendants
c = correction factor


You could then take the number of individuals you know in an ancestral generation and divide it by D to get a "Completeness" value for each of your generations.

You can then combine the generations in some way (the 50% per generation techniques) as you and others suggest.

This method will account for "pedigree collapse" and give you a better measure than assuming power of 2 growth.

+1 Only had time to skim read that page but will try to do it justice when I'm less busy than today. First thought is that Tamura's measure is about estimating number of descendants in a single generation when I'm trying to put a figure to how well I'm going on putting a name to all branches on my family tree (albeit not necessarily to genealogical standards so I treat it as a theory sometimes far from fact). Definitely related but I think looking at an "opposite" problem. – PolyGeo – 2012-11-15T23:00:02.583

I love algorithms that incorporate a "correction factor". When I was teaching science, students used a "fudge factor" - the number which when multiplied by the result of a lab procedure produced the value in the textbook. – Fortiter – 2012-11-16T02:42:11.270

1

To the best of my knowledge, there is neither an 'Official" name for what you are talking about nor any genealogy package that calculates it.

1

Although I devoted some considerable length in another answer to explaining why PolyGeo's proposal was not a good idea, he did ask for tools to do it.

---Those who believe that there is no maths in genealogy should probably stop reading here.---

The key to identifying what each newly-found ancestor contributes to this measure of tree completion lies in the Ahnen number assigned to him or her.

When you allocate Ahnen number A to a confirmed ancestor, then you move toward your goal by an amount equal to

2^-(floor (log2 (A))*2)

If your current software lets you generate a list of Ahnen numbers, then applying that algorithm to each element in the list in turn and summing the results gives the current value of the index.

In the case you described, this list of ancestor {2,3,4,5,6,7,8,9,19,11,12,13,14,15,20,21,29} gives an index of 0.8867189.

If you then manage to acquire the baptismal certificate of 29 that lists her parents (58 and 59), then your tree grows to 0.8886721.

To return to my original objection, that seems poor acknowledgement for the effort that would have been required to identify two 4xgreat grandparents.