Regular Expressions Review

A hobby of mine is to learn new programming languages.  I try and learn at least one a year, and use it for more than just a hello world app.  So this year is the year of python, where if I’m required to write a script Python is the go to guy.  Having that said, I recently had a need for regular expressions, so python was used.

Being most familiar with Java and Ruby, Python seems a little in between, but one feature that stuck out was the regular expression syntax.  Python let’s you put a ‘r’ in front of the quote to denote raw input.  This means you don’t have to escape back slashes twice (ala Java).

For those not familiar with regular expressions, here is an example of a regular expression in a few languages:

Java:
Find Slashes: "[\\\\/]"

Python:
Find Slashes: r"[\\/]"

Ruby:
Find Slashes: /[\\\/]/

This is just a simple example to illustrate a point, but look at Java.  Find slashes has 4 backslashes.  Even if regular expressions were typically this simple (they’re not), that seems unnecessary.  Anytime it’s necessary for a backslash to make it to the regular expression processor, there much be two in the regular expression.  Another quick example, to find a period (.) the regular expression would be “\\.”.  This compounds very quickly, and makes it painful to use regular expressions in Java.

I can’t understand why Java wouldn’t adopt the python syntax of putting a ‘r’ in from of a string to denote raw input.  At quick glance, it doesn’t seem as though it would break any currently in use regular expressions because the old syntax would continue working as expected.  It’d make them better for the future.

Here are some Regular Expression resources that I find useful:

http://www.rubular.com/ A Ruby Regular Expression Tester

http://www.fileformat.info/tool/regex.htm A Java Regular Expression Tester

Posted on February 14, 2009 at 2:18 pm by Joe · Permalink
In: Java, Python, Ruby · Tagged with: