<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Blog::Quibb &#187; spelling checker</title>
	<atom:link href="http://blog.quibb.org/tag/spelling-checker/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.quibb.org</link>
	<description>Software development and more.</description>
	<lastBuildDate>Tue, 10 Aug 2010 14:11:56 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Spell Checking in Python</title>
		<link>http://blog.quibb.org/2009/04/spell-checking-in-python/</link>
		<comments>http://blog.quibb.org/2009/04/spell-checking-in-python/#comments</comments>
		<pubDate>Sat, 11 Apr 2009 15:38:10 +0000</pubDate>
		<dc:creator>Joe</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[nlp]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[spelling checker]]></category>

		<guid isPermaLink="false">http://blog.quibb.org/?p=54</guid>
		<description><![CDATA[I was looking into spell checking in Python.  I found spell4py, and downloaded the zip, but couldn&#8217;t get it to build on my system.  If I tried a bit longer maybe, but in the end my solution worked out fine.  This library was overkill for my needs too. I found this article here: http://code.activestate.com/recipes/117221/ This [...]]]></description>
			<content:encoded><![CDATA[<p>I was looking into spell checking in Python.  I found <a href="http://www.keyphrene.com/products/4py/">spell4py</a>, and downloaded the zip, but couldn&#8217;t get it to build on my system.  If I tried a bit longer maybe, but in the end my solution worked out fine.  This library was overkill for my needs too.</p>
<p>I found this article here: <a href="http://code.activestate.com/recipes/117221/">http://code.activestate.com/recipes/117221/</a></p>
<p>This seemed to work well for my purposes, but I wanted to test out other spell checking libraries.    Mozilla Firefox , Google Chrome, and OpenOffice all use hunspell, so I wanted to try that one (as I&#8217;m testing the spelling of words on the Internet).  Here are some python snippets to get you up and running with the popular spelling checkers.  I modified these to take more than 1 word, split them up, and then return a list of suggestions.  They do require each spelling checker to be installed.  I was able to do this through the openSuSE package manager.</p>
<p><a href="http://en.wikipedia.org/wiki/Ispell"><strong>Ispell</strong></a></p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">popen2</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">class</span> ispell:
    <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        <span style="color: #008000;">self</span>._f = <span style="color: #dc143c;">popen2</span>.<span style="color: black;">Popen3</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;ispell&quot;</span><span style="color: black;">&#41;</span>
        <span style="color: #008000;">self</span>._f.<span style="color: black;">fromchild</span>.<span style="color: #dc143c;">readline</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span> <span style="color: #808080; font-style: italic;">#skip the credit line</span>
    <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__call__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, words<span style="color: black;">&#41;</span>:
        words = words.<span style="color: black;">split</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">' '</span><span style="color: black;">&#41;</span>
        output = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
        <span style="color: #ff7700;font-weight:bold;">for</span> word <span style="color: #ff7700;font-weight:bold;">in</span> words:
            <span style="color: #008000;">self</span>._f.<span style="color: black;">tochild</span>.<span style="color: black;">write</span><span style="color: black;">&#40;</span>word+<span style="color: #483d8b;">'<span style="color: #000099; font-weight: bold;">\n</span>'</span><span style="color: black;">&#41;</span>
            <span style="color: #008000;">self</span>._f.<span style="color: black;">tochild</span>.<span style="color: black;">flush</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
            s = <span style="color: #008000;">self</span>._f.<span style="color: black;">fromchild</span>.<span style="color: #dc143c;">readline</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>.<span style="color: black;">strip</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
            <span style="color: #008000;">self</span>._f.<span style="color: black;">fromchild</span>.<span style="color: #dc143c;">readline</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span> <span style="color: #808080; font-style: italic;">#skip the blank line</span>
            <span style="color: #ff7700;font-weight:bold;">if</span> s<span style="color: black;">&#91;</span>:<span style="color: #ff4500;">8</span><span style="color: black;">&#93;</span> == <span style="color: #483d8b;">&quot;word: ok&quot;</span>:
                output.<span style="color: black;">append</span><span style="color: black;">&#40;</span><span style="color: #008000;">None</span><span style="color: black;">&#41;</span>
            <span style="color: #ff7700;font-weight:bold;">else</span>:
                output.<span style="color: black;">append</span><span style="color: black;">&#40;</span><span style="color: black;">&#40;</span>s<span style="color: black;">&#91;</span><span style="color: #ff4500;">17</span>:-<span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span>.<span style="color: black;">strip</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>.<span style="color: black;">split</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">', '</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">return</span> output</pre></div></div>

<p><a href="http://en.wikipedia.org/wiki/GNU_Aspell"><strong>Aspell</strong></a></p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">popen2</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">class</span> aspell:
    <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        <span style="color: #008000;">self</span>._f = <span style="color: #dc143c;">popen2</span>.<span style="color: black;">Popen3</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;aspell -a&quot;</span><span style="color: black;">&#41;</span>
        <span style="color: #008000;">self</span>._f.<span style="color: black;">fromchild</span>.<span style="color: #dc143c;">readline</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span> <span style="color: #808080; font-style: italic;">#skip the credit line</span>
    <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__call__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, words<span style="color: black;">&#41;</span>:
        words = words.<span style="color: black;">split</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">' '</span><span style="color: black;">&#41;</span>
        output = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
        <span style="color: #ff7700;font-weight:bold;">for</span> word <span style="color: #ff7700;font-weight:bold;">in</span> words:
            <span style="color: #008000;">self</span>._f.<span style="color: black;">tochild</span>.<span style="color: black;">write</span><span style="color: black;">&#40;</span>word+<span style="color: #483d8b;">'<span style="color: #000099; font-weight: bold;">\n</span>'</span><span style="color: black;">&#41;</span>
            <span style="color: #008000;">self</span>._f.<span style="color: black;">tochild</span>.<span style="color: black;">flush</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
            s = <span style="color: #008000;">self</span>._f.<span style="color: black;">fromchild</span>.<span style="color: #dc143c;">readline</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>.<span style="color: black;">strip</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
            <span style="color: #008000;">self</span>._f.<span style="color: black;">fromchild</span>.<span style="color: #dc143c;">readline</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span> <span style="color: #808080; font-style: italic;">#skip the blank line</span>
            <span style="color: #ff7700;font-weight:bold;">if</span> s == <span style="color: #483d8b;">&quot;*&quot;</span>:
                output.<span style="color: black;">append</span><span style="color: black;">&#40;</span><span style="color: #008000;">None</span><span style="color: black;">&#41;</span>
            <span style="color: #ff7700;font-weight:bold;">elif</span> s<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span> == <span style="color: #483d8b;">'#'</span>:
                output.<span style="color: black;">append</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;No Suggestions&quot;</span><span style="color: black;">&#41;</span>
            <span style="color: #ff7700;font-weight:bold;">else</span>:
                output.<span style="color: black;">append</span><span style="color: black;">&#40;</span>s.<span style="color: black;">split</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">':'</span><span style="color: black;">&#41;</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span>.<span style="color: black;">strip</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>.<span style="color: black;">split</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">', '</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">return</span> output</pre></div></div>

<p><a href="http://en.wikipedia.org/wiki/Hunspell"><strong>Hunspell</strong></a></p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">popen2</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">class</span> hunspell:
    <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        <span style="color: #008000;">self</span>._f = <span style="color: #dc143c;">popen2</span>.<span style="color: black;">Popen3</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;hunspell&quot;</span><span style="color: black;">&#41;</span>
        <span style="color: #008000;">self</span>._f.<span style="color: black;">fromchild</span>.<span style="color: #dc143c;">readline</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span> <span style="color: #808080; font-style: italic;">#skip the credit line</span>
    <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__call__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, words<span style="color: black;">&#41;</span>:
        words = words.<span style="color: black;">split</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">' '</span><span style="color: black;">&#41;</span>
        output = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
        <span style="color: #ff7700;font-weight:bold;">for</span> word <span style="color: #ff7700;font-weight:bold;">in</span> words:
            <span style="color: #008000;">self</span>._f.<span style="color: black;">tochild</span>.<span style="color: black;">write</span><span style="color: black;">&#40;</span>word+<span style="color: #483d8b;">'<span style="color: #000099; font-weight: bold;">\n</span>'</span><span style="color: black;">&#41;</span>
            <span style="color: #008000;">self</span>._f.<span style="color: black;">tochild</span>.<span style="color: black;">flush</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
            s = <span style="color: #008000;">self</span>._f.<span style="color: black;">fromchild</span>.<span style="color: #dc143c;">readline</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>.<span style="color: black;">strip</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>.<span style="color: black;">lower</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
            <span style="color: #008000;">self</span>._f.<span style="color: black;">fromchild</span>.<span style="color: #dc143c;">readline</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span> <span style="color: #808080; font-style: italic;">#skip the blank line</span>
            <span style="color: #ff7700;font-weight:bold;">if</span> s == <span style="color: #483d8b;">&quot;*&quot;</span>:
                output.<span style="color: black;">append</span><span style="color: black;">&#40;</span><span style="color: #008000;">None</span><span style="color: black;">&#41;</span>
            <span style="color: #ff7700;font-weight:bold;">elif</span> s<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span> == <span style="color: #483d8b;">'#'</span>:
                output.<span style="color: black;">append</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;No Suggestions&quot;</span><span style="color: black;">&#41;</span>
            <span style="color: #ff7700;font-weight:bold;">elif</span> s<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span> == <span style="color: #483d8b;">'+'</span>:
                <span style="color: #ff7700;font-weight:bold;">pass</span>
            <span style="color: #ff7700;font-weight:bold;">else</span>:
                output.<span style="color: black;">append</span><span style="color: black;">&#40;</span>s.<span style="color: black;">split</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">':'</span><span style="color: black;">&#41;</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span>.<span style="color: black;">strip</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>.<span style="color: black;">split</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">', '</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">return</span> output</pre></div></div>

<p>Now, after doing this and seeing the suggestions.  I decided a spell checker isn&#8217;t really what I was looking for.  A spelling checker always tries to make a suggestion, and I wanted to filter out things from a database.  I started this with the hope that I would be able to take misspellings and convert them into the correct word.  In the end, I just removed words that were not spelled correctly using <a href="http://wordnet.princeton.edu/">WordNET</a> through <a href="http://www.nltk.org/">NLTK</a>.  WordNET had a bigger dictionary than most of the spell checkers which also helped in the filtering task.  NLTK has a simple <a href="http://nltk.googlecode.com/svn/trunk/doc/howto/wordnet.html">how to</a> on how to get started using WordNET.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.quibb.org/2009/04/spell-checking-in-python/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
