Remove all characters following a certain character in a column of a dataset

0

I have a data set like the following, and the first column contains the groupings. However, some are labelled slightly differently. I need to remove all characters following the punctuation used (bracket, semicolon, comma).

groups <- c("Group1", "Group1", "Group1;Group1", "Group1(subset)", "Group1,ex" )

I would like this to present all of these just as Group1 (so they would all appear the same as the first two) - so to remove all characters in the string following the punctuation. I then need to repeat this for 1000s of groups, all in that same column of my dataset).

I know gsub is an option, but I'm not clear on how to use it to remove all of a string following a number of different characters, or on how to use it on just one column of a very large dataset.

NewtoR

Posted 2020-05-01T11:46:30.650

Reputation: 1

1gsub("[,;(].*$","",groups) – Valentas – 2020-05-02T13:40:08.640

Answers

0

substr(groups,1,6)
'Group1''Group1''Group1''Group1''Group1'

Lokeshkumar

Posted 2020-05-01T11:46:30.650

Reputation: 1