Average number of people per name?

2

1

Are there statistics of "name ambiguity?

For example, how many people share the same full name on average?

This would be particularily interesting per country.

Chase

Posted 2014-05-04T16:34:07.167

Reputation: 29

Answers

8

Although it may seem otherwise when researching a family with a common name in a limited area, Anglo-Saxon full names (first and last) are surprisingly unique. It's statistically quite uncommon for two full names to be the same.

US Death Data

The Social Security Death Index (Death Master File) is a file of about 88 million people, a subset of those who have died in the United States in the past 70 or so years (with most of the names in the later years).

Using that data (for the 2010-11-17 edition), I've sorted it by first and last name (ignoring middle initials or names, as they're not consistently reported). There are 31,425,850 different unique names for the 87,873,196 people listed.

Here are most common full names (first 50 shown, with the number of times they occur):

20944 JAMES SMITH 20590 WILLIAM SMITH 18845 MARY SMITH 17938 JOHN SMITH 16307 ROBERT SMITH 13540 JAMES WILLIAMS 13432 JAMES JOHNSON 12980 WILLIAM JOHNSON 12807 CHARLES SMITH 12625 JAMES BROWN 12366 JOHN JOHNSON 12296 MARY JOHNSON 12125 JOHN WILLIAMS 11957 WILLIAM BROWN 11873 ROBERT JOHNSON 11790 WILLIAM JONES 10915 GEORGE SMITH 10707 MARY WILLIAMS 10690 JAMES JONES 10514 MARY BROWN 10323 JOHN MILLER 10022 MARY JONES 9989 WILLIAM MILLER 9869 JOHN JONES 9768 ROBERT WILLIAMS 9700 JOHN BROWN 9604 ROBERT JONES 9560 JAMES DAVIS 9416 ROBERT BROWN 9069 WILLIAM DAVIS 8906 MARY MILLER 8671 JOHN DAVIS 8201 ROBERT MILLER 8083 CHARLES JOHNSON 8037 MARY DAVIS 7986 JAMES WILSON 7506 JAMES MILLER 7285 HELEN SMITH 7243 JAMES MOORE 7116 JAMES TAYLOR 7103 CHARLES BROWN 7095 JOHN ANDERSON 7017 WILLIAM WILSON 7015 MARGARET SMITH 6989 JOHN WILSON 6983 CHARLES MILLER 6956 GEORGE JOHNSON 6849 CHARLES WILLIAMS 6756 DOROTHY SMITH 6739 WILLIAM TAYLOR

Even the most common name, James Smith, only occurs 20,944 times (out of 87,873,196) - so 0.02%, or 1 in 4200 people have that name. And it rapidly tails off - there are only 2,395 names that occur more than 1,000 times, totalling 4,848,744 people (and this is ignoring middle names, which would make them more unique).

The mode (most commonly occurring frequency) is 1 with 23,524,403, in other words there's a more than 1 in 4 chance any particular full name only occurs once. There's only a 1 in 18 chance that a name is a name that occurs more than 1,000 times. The mean number of times each name occurs is 2.8, the standard deviation is 26.5.

Since this is historical data, based on people dying decades ago (so born many more decades ago), it won't reflect the current name frequencies in the US.

US White Pages and Facebook

An article last year used full name frequencies from the recent US White Pages (so still biased, towards older landline users). That has many more Hispanic names, and a strange dearth of John Smiths: Why Aren’t There More John Smiths in the U.S.?

A 2009 report of Facebook names found instead there were too many Jane Smiths (possibly as fake names): Most Common First, Last, and Full Names on Facebook

Global Names

As you mentioned, it would be nice to have the uniqueness of names by country (and culture). I'm not aware of any source for this, since it's hard to get a full name list (even historical) for most countries, or censuses. In most countries, there are almost no bulk downloads possible for genealogy or name data (other than separate tables of first and last name ranks).

However, a larger Facebook name list from October 2010 (170,879,859 names) is moderately global and has 100,128,460 different full "real" names (some of these names will be fiction and intentionally original). The following names make the top 50. Note how they occur even less times, despite the larger sample, because there's a wider variety of names. It's heavily biased towards male names, perhaps because female users are less likely to complete their real names.

17204 john smith 7440 david smith 7200 michael smith 6784 chris smith 6371 mike smith 6149 arun kumar 5980 james smith 5939 amit kumar 5926 imran khan 5861 jason smith 5374 chris johnson 5294 jessica smith 5231 chris brown 5210 mike jones 5092 michael johnson 5084 mark smith 5039 sarah smith 4953 anil kumar 4877 manoj kumar 4875 praveen kumar 4771 ashley smith 4749 vijay kumar 4693 kevin smith 4646 david johnson 4587 chris jones 4538 sunil kumar 4515 ryan smith 4493 robert smith 4462 david jones 4452 brian smith 4367 jennifer smith 4343 ahmed ali 4316 steve smith 4315 rajesh kumar 4291 rahul sharma 4230 paul smith 4213 michael williams 4201 ravi kumar 4155 michael brown 4153 raj kumar 4141 david brown 4031 amanda smith 3965 lisa smith 3946 ali khan 3936 matt smith 3921 david williams 3920 chris williams 3826 john williams 3757 andrew smith 3742 adam smith

Here the mode is 1 with 86,585,871 - so there's a better than 1 in 2 chance any particular real full name on Facebook is unique. The mean is 1.7, standard deviation is 10.4.

It's unclear why some names in the 2009 Facebook report, like Jane Smith and Juan Carlos, hardly appear in the 2010 list. Both lists are just samples, so there must be something very different about the way they were obtained (and which shows that all such rankings should be considered just estimates based on available data).

Other Cultures

Even that Facebook data is hardly global - strong on western countries, and south Asia, but it very poorly represents names from East Asia, Africa, and South America (one reason: that list is limited to names which start with a latin character set character).

There are some countries where names are nearly unique (like Thailand). Many other cultures have a shortage of "last" names (family names), sometimes because of the way they were introduced. This applies especially to China, Korea and parts of Scandinavia. There are also countries, like Iceland and Korea (among others), which require first names to be from an approved list, limiting the variety of names available.

Rob Hoare

Posted 2014-05-04T16:34:07.167

Reputation: 6 076

What a wonderful answer, Rob! Can I just point out that you have achieved an example of the kind of statistical genealogy I was thinking of in an earlier question. I wonder if the distributions of first names were independent of surnames or vice versa. Could be a good question for Cross Validated, the statistics StackExchange.

– Verbeia – 2014-05-11T09:58:40.767

@verbeia - at a first glance at the SSDI data above, James is the most common male first name for almost all these top surnames, and Mary the most common female name. One exception is Miller, where John wins. I'd guess this is because many of the Millers were originally Muellers, from Germany. So the distribution of first name by surname will very by the ethnic origins of that name (it becomes even more obvious of course with Hispanic names). And it'll also vary by decade of birth, as names go out of fashion (such as Robert, in recent decades). – Rob Hoare – 2014-05-11T19:53:56.277

2

UCL did some work on the distribution of surnames in the UK which is accessible via the PublicProfiler website. I'm not aware of anything that looks at the combination of forenames and surnames.

user104

Posted 2014-05-04T16:34:07.167

Reputation: