I’ve finished launching my latest Inversoft product – The Inversoft Bad Word Database. The website is written entirely in Ruby on Rails and uses PayPal for purchases. All and all the coding was extremely straight forward and simple. Here’s the address:
The product idea essentially sprouted from my work on https://www.naymz.com where I need to add some filtering logic to the site so that folks could not enter any restricted words (cuss, slang, swear, superlatives, derogatory, etc.) when constructing their ad for the search engines. Google, Yahoo and the others reject advertisements whose content is not valid or contains restricted words. As I built out the functionality for Naymz, I wanted to test it with some data. Unfortunately there was no data to be found.
I did quite a bit of searching and found that there were services out there who offered a web service you would sent the content to and they would validate it. This just didn’t make sense to me since I wanted to have as few external dependencies as possible and most of these sites could not guarentee reliability, up-time or accuracy of the content filtering. I also found a few other sites that had lists of words, but they were not in a database friendly form. These websites also suffered from the fact that some words I would not consider “restricted” at all and yet again the data would need major scrubbing. Another site I found offered hundreds of thousands of words entered completely by its users. This was probably the worst considering that I had no way of making a determination of which words I wanted to load and which I didn’t because as we all know users will add anything when there aren’t any checks and balances.
What I really wanted was a list of words, with definitions, ratings and categories. So, in my spare time I started building this out. I surfed the web and found all the words I could. I started adding definitions to them as well as a ranking system that would allow applications to adjust the level of leniency they wanted to allow. I also found that many times a list of alternate spellings was need because folks are smart and can beat filters by typing something like sh7t.
After quite a bit of data scrubbing, tweaking and coding, I finished the website and did an initial data load. The initial load contained approximately 500 words and another 500+ misspellings (or something close). The website is live and I’ll be adding words to the database each day and also be building out an multi-language version that contains bad words from as many languages as possible. Check it out and feedback is much appreciated.