How to compare genetic profiles or vcf files in Python?


I have hundreds of vcf file where each vcf file contains genome profile for a tissue. A portion of the vcf file is as follows:

VCF file/dataframe

I can read each vcf file into a dataframe. So it would be hundreds of dataframes. Each vcf file/dataframe contains hundreds of columns and 40/50 thousands rows. I want to see the difference in ALT column for each profile (vcf files/ dataframes) on CHROM, POS, ID and REF columns. What would be the best way to compare these dataframes/vcf files to see any similarity on ALT column? Thanks in advance.


Posted 2019-06-26T06:56:04.407

Reputation: 1

Well you've somewhat elaborated your data. Maybe first: As far as I am aware, different entities have different genetics (regarding to number of chromosomes for example), so maybe you can elaborate on how you want to use that data? Second: Please elaborate on your target, are you trying to classify samples from this data into categories; e.g. are you trying to predict that this genetic profile belongs to a dog, or even to a specific race of dog? – GrizZ – 2019-06-26T07:56:56.957

maybe it's better to ask this over at

– Pallie – 2019-06-26T07:57:58.267

No answers