Does there exist a resource for vetting banned words for chatbots?


So, Tay the racist tweeter bot... one thing that could have prevented this would have been to have a list of watchwords to not respond to, with some logic similar to foreach (word in msg) {if (banned_words.has(word)) disregard()}.

Even if that wouldn't, what I'm getting at is obvious: I am building a chatterbot that must be kid-friendly. For my sake and for the sake of whoever finds this question, is there a resource consisting of a .csv or .txt of such words that one might want to handle? I remember once using a site-blocking productivity extension that had visible its list of banned words; not just sexually charged words, but racial slurs, too.


1Banning words without context is not really fruitful. For example 'you have a car to kill for' is not the same as 'i will kill you'. Also anyone can make racist tweets with common words, you don't need bad words to make bad sentences. – DuttaA – 2020-04-05T06:43:52.973

Certainly, though, it could be useful to ban more obvious words? There is zero excuse for using, for example, the more strident and unambiguous racial slurs. – JohnnyApplesauce – 2020-04-05T14:02:14.550

Well you cannot prevent anything. That's the point you can ban as many words you like. If someone wanted to make it learn it will make it learn, racism is not bound to racial slurs. – DuttaA – 2020-04-05T14:42:23.210



I have not found one other than scraping a few pages from Urban Dictionary, I built my list via crowdsourced style and got a number of interesting words I had not considered.

Start with the worst words you can think of, then try slang and accidental or on purpose misspellings of them


