Setting up a VirtualBox LAMP Server

Introduction

I recently decided to play around with web development a little bit. Not being familiar with setting up a web server, I decided to setup a VirtualBox LAMP server. Since I couldn’t find a good guide that went through all the steps of setting up a VirtualBox LAMP Server in one place, I decided to write about my experience. I wanted a LAMP Server that I could access from any machine on my local network. In retrospect, it isn’t very hard to do, but having all the information in one place is nice.

Installing VirtualBox

Start by installing VirtualBox. The open source edition (OSE) should be good enough to use for the purposes of this guide. I was installing the full edition on openSuSE 11.2, and there were some issues.  The issue I had was solved with this command: sudo chmod +x /usr/lib/virtualbox/VirtualBox and remove /tmp/*vbox*. Generally speaking it’s fairly easy to install VirtualBox on Windows. When creating a new virtual machine, I allocated 512MB of RAM and 12GB of HD space.

Installing LAMP

I chose Ubuntu for my LAMP server largely because there are many documents on how to setup a LAMP server on top of Ubuntu. I’ll do a quick overview here, and provide a link to setting up a LAMP server on Ubuntu (Hardy Heron). I liked this guide more than the guide for Jaunty because this one tells you to install the OpenSSH server, and being able to administer the VM remotely is a good idea.

Start by downloading the Ubuntu Server Edition. I tried downloading and installing Hardy Heron, which is the latest Long Term Support (LTS) release, but I kept getting a Kernel Panic when trying to boot in VirtualBox 3.1.2. It may have been the combination of VirtualBox version and Ubuntu version. Eventually, I ended up going with Karmic Koala. The installation process is almost identical to the Hardy Heron installation, and it provides both the LAMP and OpenSSH options that the guide suggests.

Network Configuration

Click on your newly created virtual machine, and open the settings dialog. Then click the Network settings area. For Adapter 1, make sure that the Enable Network Adapter checkbox is checked. Adapter 1 is attached to NAT by default, switch it to Bridged Adapter for it to look like a regular PC to the rest of your network. It will acquire an IP address from your router, like a normal computer. If you only want your host computer to be able to access it the Host-only Adapter option seems like an appropriate choice, but I did not use or test this option. After changing the setting, start up the virtual machine. To get the IP of the virtual machine use the ifconfig command. If you point your browser at that IP, you should see the apache welcome page.

Static IP

Setting a static IP address for the virtual machine is a good idea so you can always access the same IP address. These are the instructions for setting a static IP in Ubuntu.

Edit the /etc/network/interfaces file using vim or nano:

sudo [your_editor] /etc/network/interfaces

Find this line:

iface eth0 inet dhcp

Change it to:

iface eth0 inet static
address 192.168.1.99
netmask 255.255.255.0
network 192.168.1.0
broadcast 192.168.1.255
gateway 192.168.1.1

These settings worked well for my Linksys router. By default, the DHCP service the router provides starts using IP addresses starting with 192.168.1.100. By using 192.168.1.99 it was outside of that range. The Linksys router defaults to using 192.168.1.1 for its own IP, which is why the gateway is set to that.

Pure-FTPd FTP Server

The last step is setting up an FTP Server on it so you can easily transfer files.  For this I chose Pure-FTPd because the project prides itself on is being easy to configure.  It largely worked right out of the box without any configuration.

To install it:

sudo apt-get install pure-ftpd

Some Pure-FTPd configuration:

CD to the configuration directory located here:
/etc/pure-ftpd/conf

Set display dot files to on (so you can see your .htaccess file):
echo yes > DisplayDotFiles

Restart Pure-FTPd:
sudo /etc/init.d/pure-ftpd restart

Get your user connected to the /var/www directory:
CD to your home folder and create a symbolic link to /var/www
ln -s /var/www www

Change ownership /var/www to your user, so you can write to this directory.
chown -R  /var/www

Change to 755 permissions
chmod -R 755 /var/www

You should now be able to connect to the FTP server from anywhere on your network by pointing your FTP client at: 192.168.1.99 (or any IP you may have chosen). It should have no problem running PHP files.

If any part of this short guide was confusing or didn’t work, leave a comment so I can look into it and update the guide.

Posted on February 16, 2010 at 11:20 pm by Joe · Permalink · 9 Comments
In: Uncategorized · Tagged with: ,

NLTK Regular Expression Parser (RegexpParser)

The Natural Language Toolkit (NLTK) provides a variety of tools for dealing with natural language.  One such tool is the Regular Expression Parser.  If you’re familiar with regular expressions, it can be a useful tool in natural language processing.

Background Information

You must first be familiar with regular expressions to be able to fully utilize the RegexpParser/RegexpChunkParser.  If you need to learn about regular expressions, here is a site with an abundance of information to get you started: http://www.regular-expressions.info.  It is also necessary to know how to use a tagger, and what the tags mean.  A tagger is a tool that marks each word in a sentence with its part of speech.  Here is a small comparison I did of python taggers: NLTK vs MontyLingua Part of Speech Taggers.  The NLTK RegexpParser works by running regular expressions on top of the part of speech tags added by a tagger.  The Brown Corpus tags will be the tags used throughout the rest of this post, and are commonly used by taggers in general.  On a side note, the RegexpParser can be used with either the NLTK or MontyLingua tagger.

Basic RegexpParser Usage

Let me start by going over the “how to” provided in the NLTK documentation.  The source of this information is here: NLTK RegexParser HowTo.  The documentation goes through how you could use the RegexParser/RegexpChunkParser to do a traditional parse of a sentence.

The RegexParser/RegexChunkParser works by defining rules for grouping different words together.  A simple example would be: “NP: {<DT>? <JJ>* <NN>*}”.  This is a definition for a rule to group of words into a noun phrase.  It will group one determinant (usually an article), then zero or more adjectives followed by zero or more nouns.  In the how to, they go over prepositions and creating prepositional phrases from a preposition and noun phrase.  It’s important to note that earlier regular expressions can be used in later ones. Also, the regular expression syntax can occur within the tags or apply to the tags themselves.

Here is the example from the NLTK website:

parser = RegexpParser('''
    NP: {<DT>? <JJ>* <NN>*} # NP
    P: {<IN>}           # Preposition
    V: {<V.*>}          # Verb
    PP: {<P> <NP>}      # PP -> P NP
    VP: {<V> <NP|PP>*}  # VP -> V (NP|PP)*
    ''')

Alternative RegexpParser Usage

I call this an alternate usage because it can be used to find patterns that aren’t necessarily related to grammatical phrases in English.  It can be used to find any pattern in a sentence.  Let me start by showing the regular expression grammar from my program.

grammar = """
	NP:   {<PRP>?<JJ.*>*<NN.*>+}
	CP:   {<JJR|JJS>}
	VERB: {<VB.*>}
	THAN: {<IN>}
	COMP: {<DT>?<NP><RB>?<VERB><DT>?<CP><THAN><DT>?<NP>}
	"""
self.chunker = RegexpParser(grammar)

I was using it to look for a specific pattern in a sentence.  The first part, NP, is looking for a noun phrase.  The <PRP>? is there because of a bug found in the tagger I was using.  It was marking An with a capital ‘A’ as a PRP (Pronoun) rather than a DT (Determinant/Article).  I found another workaround for the bug, but left the PRP in there to catch anything that might have slipped through.

Then it moves onto the CP, which is the comparison word.  JJR tagged words are comparative adjectives.  They include words bigger, smaller, and larger.  JJS words are words that signify the most or chief.  JJS words include biggest, smallest, and largest.

The next two a simply the VERB and the word THAN.  The VERB could be a compound verb, so there would be one or more verbs present.  The IN tag denotes a preposition.  In this case, I was looking specifically for the word than.

The last line is COMP.  This is the regular expression that puts it all together.  This was looking for a size comparison of two objects.  It might be easier to look at the output of this part of the expression than trying to explain it piece by piece. The only tag not explained above is RB, which is an adverb.

Here is the parse for the sentence “Everyone knows an elephant is larger than a dog.”:

(S
  (NP everyone/NN)
  (VERB knows/VBZ)
  (COMP
    an/DT
    (NP elephant/NN)
    (VERB is/VBZ)
    (CP larger/JJR)
    (THAN than/IN)
    a/DT
    (NP dog/NN))
  ./.)

The output is a simple tree, that makes to easy data extraction. It’s easy to see there are many possibilities that open up when looking for patterns in English text.  May this help you in your data mining endeavors.

Posted on January 27, 2010 at 9:53 am by Joe · Permalink · 3 Comments
In: Python · Tagged with: ,

Sort Optimization (Part 2) with JDK 6 vs JDK 7

In part 1, I went over my first foray into the world of sorting algorithms.  Since then, I’ve had some other ideas on how to improve my quicksort implementation.  One idea that I had while originally working on the sorting algorithm, was to rework the partition function to take into account duplicate elements.  I had a few different working implementations, but all of them came with severe performance penalty.  I finally figured out a way to get performance close to the previous algorithm.

The partition function needs to perform the minimal number of swaps possible.  So moving towards the center from both ends and only swapping when both are out of order is the best approach I’ve found so far.  When grouping duplicate elements, they are swapped to the beginning of the partition area as they are found.  Then at the end, a pass is run to move them to their correct location in the final list.  Then instead of returning one number from the partition function, it returns two.  It returns the minimum and maximum indices on the range that has the pivot value.

Another area that I was able to get some performance gain out of was getting rid of the shell sort form the first algorithm.  While that was there to make sure the quicksort did not recurse too deeply, in practice the shell sort algorithm doesn’t run.

Results

Here are the results of JDK 6 MergeSort, Tim Sort, QSort, QSortv2, and Dual Pivot sort 2 benchmarked on the same set of files.  Overall, the new version doesn’t outperform the old version, but I thought it was worth posting my findings.  On most data sets with duplicates it does perform better.  I ran these benchmarks on OpenJDK 7 because I was curious as to how they would compare to one another.

It’s important to note that the tables are speedup relative the Java implementation on the given JDK.  The graphs are the average runtimes for each algorithm.  The reason for doing the average runtime is that it could show the performance difference between Sun’s JDK 6 and OpenJDK 7 build 73.

Sun JDK 6 without Warmup

Sun JDK 6 without Warmup

Sun JDK 7 without Warmup

OpenJDK 7 without Warmup


Sun JDK 6 1000 Warmup Iterations

Sun JDK 6 1000 Warmup Iterations

OpenJDK 7 1000 Warmup Iterations

OpenJDK 7 1000 Warmup Iterations


JDK 6 vs JDK 7 with No Warmup

Sun JDK 6 vs OpenJDK 7 without Warmup

Sun JDK 6 vs OpenJDK 7 1000 Warmup Iterations

Sun JDK 6 vs OpenJDK 7 1000 Iterations of Warmup

Conclusions

Overall the new version of the Qsort implementation doesn’t improve greatly over the previous implementation.  While it didn’t work out to be the performace improvement I was looking for.  I think the last graph with 1000 iterations of warmup for each algorithm is the most interesting.  The Qsort v2 implementation apparently doesn’t get handled any better by OpenJDK 7.  The partition function is larger after my changes, so perhaps it didn’t JIT very well.  What is interesting is the boost that Tim Sort saw with the change of JDK’s.  Running these benchmarks made me realize that upgrading my Java Runtime will increase the performance of all my Java applications.  It will be interesting to see if the performance carries over to Netbeans and Eclipse; I expect it will.

Posted on December 23, 2009 at 11:00 am by Joe · Permalink · Leave a comment
In: Java · Tagged with: , ,

Sorting Algorithm Shootout

Since I did my Sort Optimization post, I’ve been keeping an eye on things that happen in the sorting world.  Recently an article popped up on Reddit about someone wanting to replace the JDK sorting algorithm with a Dual Pivot Quick Sort.  This lead to the discovery that Tim Sort would be replacing Merge Sort in the JDK starting with version 7.  This probably got some attention because of the OpenJDK project.  It’s nice to see that allowing more developers to work on different areas of the JDK.  First I’ll do a quick overview of the algorithms, then show some benchmarks.  All algorithms are written in Java.

JDK 6 Sort

The JDK6 implements a fairly standard Merge Sort.  It will switch to an insertion sort at a specific depth.

QSort

This is the implementation of quicksort I outlined in the earlier blog post.  It performed admirably at the time, but how will it hold up against tougher competition.  It’s pretty much an iterative quicksort, that short-circuits to a shell sort if it’s going too deep.

Original QSort Post:
Sort Optimization

Tim Sort

This is an optimized in place variation of a merge sort.  Tim Peters developed this sorting algorithm for the Python programming language.  It is in use by Python and will be used by Java starting with JDK 7.  It takes advantage of partially sorted parts of the list.

Available here:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6804124
http://hg.openjdk.java.net/jdk7/tl/jdk/rev/bfd7abda8f79

Dual Pivot Quick sort

This is a newcomer to the sorting table.  Developed by Vladimir Yaroslavskiy for the inclusion into the Java language.  The premise is the same as quick sort, only it will choose two pivot points rather than one.  He did a full writeup detailing the algorithm, and its benefits.  I did modify it to take the comparable interface, and Vladimir explicitly said this was not the intended target of the algorithm.  He has stated it is designed to work directly with primitive types.  I don’t see how doing an int comparison vs a Integer.compareTo() would be different, as long as they are used uniformly between all algorithms.  Since my sorting algorithm works with comparable, as does Tim Sort, I chose to convert this algorithm to use the Comparable interface also.

Available here:
http://article.gmane.org/gmane.comp.java.openjdk.core-libs.devel/2628

Results

These tables show the speedup relative to JDK 6 with and without warm up.

nowarm_server

sorting algorithm speedup without warm up

sorting algorithm comparison with warmup

sorting algorithm speedup with warm up

Here is the original text data if you’re interested in that.  These are in simple table format.  The columns store the runtime in seconds for each algorithm.  The number in parenthesis is the speedup relative to JDK 6.

Without Warmup

With Warmup

Tim Sort is definitely the way to go if you’re interested in a stable sorting algorithm.  I was pretty amazed when I first looked at the results with how well it actually it did.  It really takes advantage of any presorted parts of the lists.  Overall, I’d say my optimized quicksort does fairly well, but maybe it could do better.  I may have to look into that again.

Posted on October 8, 2009 at 10:00 am by Joe · Permalink · 3 Comments
In: Java · Tagged with: ,

Starting Python, Elixir, and SQLite

When I did the post about Storm, someone suggested that I look into Elixir. Since I didn’t have time to at the time, I made a note of looking into it at a later time.  That time is now. :)

Elixir and Storm are very similar, they’re both object relational mappers that provide an easy way to map your objects to database tables.  In a future post, I’ll do a more in depth comparison between the two in a future post.

Starting out, Elixir uses SQL Alchemy as a backend.  While working with the tool you will probably find yourself running into things you may not understand if you’re not familiar with SQL Alchemy.  Keeping open a tab in firefox pointed at the SQL Alchemy documentation can be useful.  It does show through in certain instances.

There are two main starting points for an ORM tool.  There is the case where you’re starting with an existing database, and the case where you’re setting up the database from scratch.  Mapping to a table that already exists with Elixir can be a little tricky depending on the relationships.

It’s as simple as this to connect to a database:

metadata.bind = "sqlite:///../sizedb.sqlite"

Here is a simple example:

class Location(Entity):
    using_options(tablename='TABLE_LOC')
    loc_id = Field(Integer, primary_key=True)
    location = Field(UnicodeText)

And here is a more complex example of connecting to an existing database table:

class Comparison(Entity):
    using_options(tablename='TABLE_COMP')
    comp_id = Field(Integer, primary_key=True)
    date_added = Field(DateTime, default=datetime.datetime.now)
    hits = Field(Integer, default=1)
 
    smaller = ManyToOne('Phrase', colname='smaller_id')
    larger = ManyToOne('Phrase', colname='larger_id')
    sentences = ManyToMany('Sentence', tablename='TABLE_COMP_SENT',
                           local_side='comp_id', remote_side='sent_id', column_format="%(key)s")

I left out most of the class specific code to focus on Elixir.  One thing that took a while to figure out was how to setup a ManyToMany relationship with specific columns in my database.  The column_format parameter is the key to being able to specify your column names directly.  I really didn’t have to use any other options besides what you see above when connecting to an existing database.  Overall, I had about five database tables to connect.

Now if it was not being setup with an existing database, many of the parameters in the different relationships are unnecessary.  For comparison here is the same example if Elixir is used to create the database tables:

class Comparison(Entity):
    comp_id = Field(Integer, primary_key=True)
    date_added = Field(DateTime, default=datetime.datetime.now)
    hits = Field(Integer, default=1)
 
    smaller = ManyToOne('Phrase')
    larger = ManyToOne('Phrase')
    sentences = ManyToMany('Sentence')

As you can see, it gets quite a bit simpler.  The underlying table information is no longer needed.  It created tables that were very similar to my hand-created tables that I had used with Storm.  When it comes to queries on the database, SQL Alchemist shows through.

I found the documentation on the Elixir webpage to be a little bit lacking in terms of queries.  SQL Alchemist has a page that more fully describes the query functions.  AND and OR operators are named and_ and or_, respectively, probably because and and or are reserved in Python.  I thought this was worth mentioning because they are common SQL operators.

Posted on May 18, 2009 at 6:41 pm by Joe · Permalink · One Comment
In: Python · Tagged with: ,