Convert tellmeGen DNA file to FTDNA file or similar to upload it

5

I have a DNA file from tellmeGen and was wondering whether I can convert the file to an ancestryDNA, FTDNA, MyHeritage or similar file format in order to upload it to myHeritage, FTDNA or DNA.Land for further investigation?

The DNA file from tellmegen is a csv file and has the following first six lines (separated by a space):

# rsid  chromosome  position    genotype
rs991757223 1   100177980   DD
rs967277439 1   108681808   DD
1KG_1_109440678 1   109440678   DD
rs750385149 1   109479801   DD
rs780371591 1   110655430   DD

However, if I change the above csv to txt with a simple text editor (on a mac if this matters) and change the header (here in the FTDNA version) the upload to any of the above websites does not function.

I use this table to guide me with the output format (via http://www.beholdgenealogy.com/blog/?p=2700).

enter image description here

This would be an example for the first two lines in a FTDNA file format:

RSID,CHROMOSOME,POSITION,RESULT
"rs991757223","1","100177980","DD"

However, it seems that simply adding the header and change the data output format does not function. Why? How can I adjust my DNA file so that I can upload it to other DNA websites?

P.S.: I was only able to upload the original file from tellmeGen to GEDMatch without any adjustments.

Edit:

After ordering the file by group (column 2), the order of the data looks (abbreviated) like this:

200610-10   0   0   AA
200610-108  0   0   AA
200610-109  0   0   AA
...
2010-08-MT-655  0   0   GG
2010-08-MT-664  0   0   AA
2010-08-MT-773  0   0   AA
...
exm-rs10862691  0   0   GG
exm-rs11136341  0   0   AG
...
indel.101007    0   0   II
indel.101499    0   0   II
...
kgp2443711  0   0   GG
kgp24661105 0   0   GG
...
rs10177008  0   0   AG
rs10178695  0   0   AG
...
rs41369547  MT  12669   GG
rs193302956 MT  12705   GG
...
rs200665918 XY  1484071 GG
rs200082252 XY  1484097 GG
...
rs2694717   X   3140992 AA
rs17051551  X   3142375 AA
...
rs34961774  Y   6033626 AC
rs34448815  Y   6033653 --
...
rs145303326 1   12378223    GG
rs150598243 1   12378274    GG
rs141044837 1   12379557    GG

Shall I remove the "noise" of the first lines?

Til Hund

Posted 2019-10-08T08:37:18.767

Reputation: 593

Answers

5

It is not all that simple to convert to an FTDNA file. There are a few things that might be tripping you up.

From your example, you seem to indicate that the first data line in your tellmeGen file is for position 100177980 on chromosome 1. Check that the lines are ordered by chromosome and position number, and that lower positions are before higher positions. I'm worried that tellmeGen did not order your positions numerically, but alphabetically, i.e. 100 is followed by 20 which is followed by 3, when it should be 3, 20, 100. If that's the case, you'll have to sort the data lines correctly.

It looks to me like the tellmeGen file has tabs between the fields, but FTDNA uses a comma. Also FTDNA needs double quotes around each field. So if you're using just a text editor, then convert all the tabs to "," and then insert a " at the beginning of each line and a " at the end of each line.

Then add as the first line the FTDNA header: RSID,CHROMOSOME,POSITION,RESULT

FTDNA also requires the same header between chromosome 22 and 23. And make sure your chromosome 23 is named "X", e.g. in my FTDNA file lines 702457 to 702462:

"rs5771007","22","49542594","TT"
"rs3888396","22","49558258","TT"
RSID,CHROMOSOME,POSITION,RESULT
"rs17883004","X","1370495","AA"
"rs5939319","X","2710157","AA"
"rs1419931","X","2713633","GG"

My FTDNA files also have Unix line terminators of LF (linefeed), whereas Windows text files have line terminators of CRLF (carriage-return + linefeed). I don't know if the line terminator needs to be changed for an upload, but if necessary, most text editors should be able to change your line terminator for you.

If you do all the above and you still can't get the upload as an FTDNA file to work, I'd suggest you try instead using the 23andMe raw data file format. It is much closer to your original file, being text format, using tabs between fields and not requiring double quotes around fields.

Try taking your original file, replacing the first header line with the following description and header line:

# This data file generated by 23andMe at: Mon Jan 15 10:49:09 2018
#
# This file contains raw genotype data, including data that is not used in 23andMe reports.
# This data has undergone a general quality review however only a subset of markers have been 
# individually validated for accuracy. As such, this data is suitable only for research, 
# educational, and informational use and not for medical or other use.
# 
# Below is a text version of your data.  Fields are TAB-separated
# Each line corresponds to a single SNP.  For each SNP, we provide its identifier 
# (an rsid or an internal id), its location on the reference human genome, and the 
# genotype call oriented with respect to the plus strand on the human reference sequence.
# We are using reference human assembly build 37 (also known as Annotation Release 104).
# Note that it is possible that data downloaded at different times may be different due to ongoing 
# improvements in our ability to call genotypes. More information about these changes can be found at:
# https://you.23andme.com/p/9ea93ca016155efc/tools/data/download/
# 
# More information on reference human assembly build 37 (aka Annotation Release 104):
# http://www.ncbi.nlm.nih.gov/mapview/map_search.cgi?taxid=9606
#
# rsid  chromosome  position    genotype

If you use 23andMe format, then use X for chromosome 23, and use Y and MT for any Y-DNA and mitochondrial data that tellmegen might be including.

Give the file you create in 23andMe format a name something like:

genome_your_name_v5_Full_20180115104908.txt

lkessler

Posted 2019-10-08T08:37:18.767

Reputation: 16 148

Thank you, Ikessler, for you extensive answer. :) One short remark: can you please tell me how the 23andMe data is ordered? Is it ordered alphabetically or as the FTDNA format? As well, I updated the snippet of my DNA file so you can see in which way the tellmeGen file is ordered. As you can see, I have some different characters like 1KG_1_109440678 in it. Do I need to delete them? – Til Hund – 2019-10-16T07:18:38.677

123andMe raw data is ordered like the FTDNA format, by chromosome and position number. The 1KG... is RSID, which is the name tellmegen assigns to the SNP. I believe most companies ignore the RSID field when uploading and just use the chromosome, position and genotype, so you shouldn't have to change that field. – lkessler – 2019-10-16T18:21:44.710

Hi Ikessler, I did as you told me and ordered the file, added the header and rename the file to no avail. It is still not recognised. When I ordered the file, then 'Y' and 'MT' and 'X' do stand at the top or shall they go after the numerical chromosomes (see edit in original post)? – Til Hund – 2019-10-19T18:55:44.377

1Til: Yes, definitely remove the chromosome 0's and XY's. Put the X's at the end, after chromosome 22. Also you shouldn't have to include the Y's and MT's at all, since autosomal uploads don't use them. If you include them, Y is after X and MT is after Y. Also 23andMe only has one value for X (if you're a male), Y and MT. i.e. Don't put GG, just put G. If the two differ, put two dashes, i.e. "--" – lkessler – 2019-10-20T14:10:11.143

Hi Ikessler, I managed to delete all "0", "XY", "Y", and "MT". I am not sure what you mean when you say to substitute all "GG" into "G" or "--". Do you mean this or did I misunderstand you? TellmeGen has already some "--" in the original file. Wouldn´t this be confusing if add more of "--" for "GG"? – Til Hund – 2019-10-20T17:58:14.940

1I meant if the two letters differ, e.g. CG, then it was not read properly since there is only one chromosome so put in "--" to represent a no-call or don't include the line. If they are the same, show only one letter. This is only for single chromosomes in 23andMe raw data. – lkessler – 2019-10-20T21:21:21.550

Thank you, Ikessler. It worked for one website (DNA.Land), although not for the commercial ones yet (like MyHeritage). I guess I am on the right track, however. Maybe I shall leave the "Y" and "MT inside in another attempt as now I am only at 548428 lines (after doing all the above mentioned adjustments) in contrast to the 638463 lines the above table claims a 23andme file has. – Til Hund – 2019-10-21T07:44:56.140

Hi, IKessler, do you know by change how a 23andme V5 file is structured as it seems that the description above is for V3!? If you know, I can open another question. – Til Hund – 2019-10-21T18:20:42.657

Sorry, no. Mine is V3. I don't have a V5. – lkessler – 2019-10-22T03:15:38.583

0

I have a DNA file from tellmeGen and was wondering whether I can convert the file to an ancestryDNA, FTDNA, MyHeritage or similar file format in order to upload it to myHeritage, FTDNA or DNA.Land for further investigation?

I have one more piece of advice for you. If you would like to import your file into FTDNA, it is not a good idea to convert it to the FTDNA format. This is because FTDNA does not allow importing FTDNA files. So a better choice is to try either 23andMe or Ancestry formats. You can check the supported ones on the FTDNA site in Autosomal transfer section.

However, there were different versions of 23andMe chips. The format (the header of file and the general structure) is the same, but the positions and values may vary. Additionally, FTDNA checks positions too! So there is only one approach that works - trying different converters and checking again whether the import is successful.

George Gaál

Posted 2019-10-08T08:37:18.767

Reputation: 1 075

-1

Instead of converting the files manually use DNA Kit Studio. It is pretty straightforward. Use the Template for 23andme V5 file format. As well, take into account the weird flip-strand that performs this company.

https://snpedia.com/index.php/TellmeGen

Ancestry Tools

Posted 2019-10-08T08:37:18.767

Reputation: 1