I have a data set like the following, and the first column contains the groupings. However, some are labelled slightly differently. I need to remove all characters following the punctuation used (bracket, semicolon, comma).
groups <- c("Group1", "Group1", "Group1;Group1", "Group1(subset)", "Group1,ex" )
I would like this to present all of these just as Group1 (so they would all appear the same as the first two) - so to remove all characters in the string following the punctuation. I then need to repeat this for 1000s of groups, all in that same column of my dataset).
I know gsub is an option, but I'm not clear on how to use it to remove all of a string following a number of different characters, or on how to use it on just one column of a very large dataset.