So I have several samples analyzed for their chemical composition. After data analysis, for each sample, I have a list of compounds found and their corresponding relative abundance. Some compounds are unique but most are actually found in most samples.
I want to do clustering analysis based on these list of compounds. How do I go about this? Specifically how to vectorize my dataset since each observation is actually an array with both numerical (abundance) and categorical (compound label) variables.