Spell Checking in Python
I was looking into spell checking in Python. I found spell4py, and downloaded the zip, but couldn’t get it to build on my system. If I tried a bit longer maybe, but in the end my solution worked out fine. This library was overkill for my needs too.
I found this article here: http://code.activestate.com/recipes/117221/
This seemed to work well for my purposes, but I wanted to test out other spell checking libraries. Mozilla Firefox , Google Chrome, and OpenOffice all use hunspell, so I wanted to try that one (as I’m testing the spelling of words on the Internet). Here are some python snippets to get you up and running with the popular spelling checkers. I modified these to take more than 1 word, split them up, and then return a list of suggestions. They do require each spelling checker to be installed. I was able to do this through the openSuSE package manager.
import popen2 class ispell: def __init__(self): self._f = popen2.Popen3("ispell") self._f.fromchild.readline() #skip the credit line def __call__(self, words): words = words.split(' ') output = [] for word in words: self._f.tochild.write(word+'\n') self._f.tochild.flush() s = self._f.fromchild.readline().strip() self._f.fromchild.readline() #skip the blank line if s[:8] == "word: ok": output.append(None) else: output.append((s[17:-1]).strip().split(', ')) return output |
import popen2 class aspell: def __init__(self): self._f = popen2.Popen3("aspell -a") self._f.fromchild.readline() #skip the credit line def __call__(self, words): words = words.split(' ') output = [] for word in words: self._f.tochild.write(word+'\n') self._f.tochild.flush() s = self._f.fromchild.readline().strip() self._f.fromchild.readline() #skip the blank line if s == "*": output.append(None) elif s[0] == '#': output.append("No Suggestions") else: output.append(s.split(':')[1].strip().split(', ')) return output |
import popen2 class hunspell: def __init__(self): self._f = popen2.Popen3("hunspell") self._f.fromchild.readline() #skip the credit line def __call__(self, words): words = words.split(' ') output = [] for word in words: self._f.tochild.write(word+'\n') self._f.tochild.flush() s = self._f.fromchild.readline().strip().lower() self._f.fromchild.readline() #skip the blank line if s == "*": output.append(None) elif s[0] == '#': output.append("No Suggestions") elif s[0] == '+': pass else: output.append(s.split(':')[1].strip().split(', ')) return output |
Now, after doing this and seeing the suggestions. I decided a spell checker isn’t really what I was looking for. A spelling checker always tries to make a suggestion, and I wanted to filter out things from a database. I started this with the hope that I would be able to take misspellings and convert them into the correct word. In the end, I just removed words that were not spelled correctly using WordNET through NLTK. WordNET had a bigger dictionary than most of the spell checkers which also helped in the filtering task. NLTK has a simple how to on how to get started using WordNET.
In: Python · Tagged with: nlp, spelling checker
on April 14, 2009 at 2:33 pm
Permalink
Thanks for the write up. For those interested, Peter Norvig has a good article on how to write a pretty good one from scratch.
http://norvig.com/spell-correct.html
on April 16, 2009 at 3:34 pm
Permalink
Thanks for the nice article.
@Anon: Good addition to article
Thanks guys
on June 18, 2013 at 7:23 pm
Permalink
The just like you read my head! You seem to understand a whole lot about it, as you composed the tutorial inside something like that. I’m sure that you simply may do with a number of pct to force the solution residence a tad, but instead of that, this can be superb blog. A great go through. I’m going to certainly return.
on October 7, 2015 at 2:49 pm
Permalink
[…] A good summary of how to wrap these in python can be found here. […]