## Why are MyHeritage atDNA SNPs higher than other test sites?

5

I have accumulated DNA segment match information from ftDNA, GEDmatch, and MyHeritage. It is just recently that I've added MyHeritage DNA donors data, and I've found that the total number of matching SNPs, for a segment, are two to four times the value of GEDmatch and ftDNA (when comparing the same donors). Does anyone know the reason for this substantial difference?

Below is a one example of many...

From GEDmatch.com:

Chr Start Location  End Location     cM    SNPs
3       23,760,064    29,187,906    6.2     931
8       29,331,698    67,590,346   26.4   3,836


From ftDNA:

Chr Start Loc   End Loc      cM     SNPs
3   24060170    29187906    7.05     856
8   29450700    68699024   25.84    3622


From MyHeritage:

Chr Start Location  End Location    cM      SNPs
3         24039380      29154358     7      2944
8         29272247      67933838    26.3   14848


3Not sure exactly what you mean by "SNPs are two to four times the value". Do you mean the number of SNPs in one match, or the total for all matches? And why are you picking on SNPs rather than centimorgans (cM)? It's cM that are usually used to compare matches. Could you provide an example showing a comparison of one of your matches at the 3 companies, which would provide some context to your question. – lkessler – 2018-04-01T04:53:19.007

@lkessler I won't get into a long explanation about my use of SNPs, but my ancestors are from an endogamous group and therefore the cM values are not at all useful for me -- I have started to use SNPs as a possible tool for measuring generational distance. – TJinBC – 2018-04-02T01:07:44.320

Thanks for the example. Now I see what you're asking. – lkessler – 2018-04-02T03:38:20.797

5

Most companies, e.g. Family Tree DNA, simply use the SNPs from the chip used in the DNA test when reporting their results. The Autosomal SNP comparison chart at International Society of Genetic Genealogy (ISOGG) wiki states that Family Tree DNA test with 698,179 SNPs. They also report only using those SNPs.

GEDmatch's SNP counts are a bit higher than Family Tree DNA's because they include the extra SNPs from the raw data data supplied by Family Tree DNA and from other companies as well. The older chips from companies had mostly the same SNPs, and GEDmatch included the small number of extra SNPs from the other companies, whereas Family Tree DNA throws them away.

Lately, newer chips have been quite different, often not having enough overlap in their SNPs with older chips to allow them to easily compare the older with the newer. GEDmatch's solution to this was to create a separate area called GEDmatch Genesis for input of raw data from the new chips. That would include 23andMe v5 and Living DNA.

MyHeritage DNA does this differently. Their technique is to include every SNP from any company they accept input from. As described in the post MyHeritage DNA: Your Questions Answered

Q. How does MyHeritage match between users who tested on different services?

A. MyHeritage takes uploaded data and extrapolates the SNPs to a common ground. This is a process called imputation. Using this method MyHeritage can match any kit currently on the market or previously distributed including 23andMe’s V4 and Ancestry’s V2 chips (as well as earlier versions) and of course MyHeritage DNA. Imputation may introduce errors so we are in the process of fine-tuning it.

In other words, MyHeritage DNA will have many more SNPs between any two base pairs, because they will not only include SNPs from Family Tree DNA, but will also include the extra SNPs from the older chips like GEDmatch does, and they will also include all the different newer SNPs from the new chips that GEDmatch GENESIS also handle.

So comparing the number of SNPs is okay to do within a company, but the SNP counts are not comparable between companies. Family Tree DNA will have the lowest counts, GEDmatch will be next, and MyHeritage DNA will be much higher for the same segment.

1

The process of adding excess data to DNA data called imputation. I know at least two providers who use this process: MyHeritage and DNA.land. I am not sure that this process is very precise, because there is possibility that your SNPs will be different from reference data and imputation will write non-existing combination. There is great article how imputation leads to false matches: https://dna-explained.com/2017/02/21/myheritage-broken-promises-and-matching-issues/comment-page-2/

– George Gaál – 2018-04-02T05:42:58.753

There another reason exists why GEDMATCH shows slighly different numbers than FTDNA. The difference is in bunching limit. We don't know precisely what is the such limit in FTDNA, but in GEDMATCH service one can set any value he likes and check how the list of matches will change. – George Gaál – 2018-04-02T05:44:47.103