Crawling the Web With Lynx

Introduction There are a few reasons you’d want to use a text based browser to crawl the web.  For example, it makes it easier to do natural language processing on web pages.  I was doing this a year or two ago, and at the time I was unable to find a Python library that would […]

Posted on November 9, 2010 at 10:38 am by Joe · Permalink · 6 Comments
In: Python · Tagged with: ,

NLTK Regular Expression Parser (RegexpParser)

The Natural Language Toolkit (NLTK) provides a variety of tools for dealing with natural language.  One such tool is the Regular Expression Parser.  If you’re familiar with regular expressions, it can be a useful tool in natural language processing. Background Information You must first be familiar with regular expressions to be able to fully utilize […]

Posted on January 27, 2010 at 9:53 am by Joe · Permalink · 3 Comments
In: Python · Tagged with: ,

Spell Checking in Python

I was looking into spell checking in Python.  I found spell4py, and downloaded the zip, but couldn’t get it to build on my system.  If I tried a bit longer maybe, but in the end my solution worked out fine.  This library was overkill for my needs too. I found this article here: http://code.activestate.com/recipes/117221/ This […]

Posted on April 11, 2009 at 11:38 am by Joe · Permalink · 5 Comments
In: Python · Tagged with: ,

NLTK vs MontyLingua Part of Speech Taggers

This is a comparison of the part of speech taggers available in python. As far as I know, these are the most prominent python taggers. Let me know if you think another tagger should be added to the comparison. MontyLingua includes several natural language processing (NLP) tools. The ones that I used in this comparison […]

Posted on March 28, 2009 at 10:23 pm by Joe · Permalink · 5 Comments
In: Python · Tagged with: , ,