Netbeans Debugger Not Stopping at Breakpoints

I use Netbeans for doing development for JRuby.  When I say that I don’t mean I’m developing with JRuby, I mean for JRuby.  I’m writing Java code, and wanted to use the debugger for Java code.  Everytime I mentioned JRuby people would assume I was using it and developing Ruby code, that’s not the case at all.  By the way, an awesome way to learn the ins and outs of a language is to develop another language in it.  I’ve learned quite a bit about Java since I started working on this, and I’d have called myself proficient in Java before starting.

Anyway, when I was trying to use the Netbeans debugger on JRuby it wasn’t stopping at my breakpoints, and I couldn’t figure out why.  I thought it might have something to do with me using linux, and after people ran out of ideas they seemed to think the same thing.  This turned out to not be the case.  It was a problem with the ant script for running the debugger.

It was as simple as changing this:

    <nbjpdastart addressproperty="jpda.address" name="JRuby" transport="dt_socket">
        <pathelement path="${jruby.classes.dir}"/>
        <path refid="build.classpath"/>

to this:

    <nbjpdastart addressproperty="jpda.address" name="JRuby" transport="dt_socket">
        <pathelement path="${jruby.classes.dir}"/>
        <path refid="build.classpath"/>
      <sourcepath path="${src.dir}"/>

The sourcepath makes all the difference. I’m posting this because nbjpdastart has very little documentation, and it took me quite a while to figure this out. I hope this saves you some time.

Posted on December 2, 2008 at 9:03 pm by Joe · Permalink · Leave a comment
In: Uncategorized · Tagged with: , , ,

Sort Optimization

This all started one night when I was in the #JRuby channel on  A channel user was complaining about JRuby’s sorting algorithm being slow.  I thought to myself, I should be able to speed it up.  At the time I was thinking since sorting has been so well researched it’d likely be easy to find documentation about it.  That didn’t turn out to be entirely the case.

I started by looking around on wikipedia, to see what that had to offer.  IntroSort caught my eye.  I thought it was interesting that after recursing to a certain depth it would switch from quicksort to heapsort.  I don’t think this optimization turned out to be that needed in the end though.  Other than switching to heapsort, it was pretty much a median of three quicksort with an insertion sort added.

It was a good starting point though.  I switched the heapsort for a shell sort and that dropped the number of comparisons needed by a good amount.  One thing I saw was only my “Median of 3 Killer” test case was affected by that.  I searched around the Internet often during this, and came across this page: QuickSort.  It had some interesting ideas like grouping the same element together when running the partition function.  I tried implementing that several times and it always ended up increasing the number of comparisons and the runtime.  I’m not 100% sure why.  If someone who knows more about sorting that I do has any input on that, I’d be happy to hear it.

Anyway, after working on that for a while I took a look at the competition.  I looked at Ruby’s C code.  They do some interesting things, which I hadn’t thought of up to that point.  First, they take the median of 7 if it has more than 200 elements.  Second, they don’t sample the end elements, this helps me out later in my optimizations.  I didn’t copy exactly what they were doing, but did a similar idea.  One thing they also did was look at the order of the 3 values that they compared last (I’ll refer to these as v1, v2, and v3).  v1 is before v2 and v3 in the list’s current state.  v2 is in the middle of the other two, and so on.

They had a lot more checks than what I ended up using, but I check if v1 <= v2 <= v3.  If this is the case I run the sequential test.  I also check if v1 >= v2 >= v3 to see if the list is in reversed order, and if it is I reverse the list before continuing.  While running these tests, I don’t check the first or last element of list because I have a separate test for that.  If it passes the sequential or reverse test that means the entire list is sorted except for potentially the first and/or the last element.  I then do a test on them and if they’re out of sequence I do bubble sort style swaps until they’re in the correct location.

Checking the end was one of the last optimizations I performed.  The main reason I added it is that case can be slow with the normal sorting algorithm, and I don’t think it’s that uncommon a case.  I’ve seen it happen where an element is appended to a sorted list and then the list is sorted again.  Overall, this catches a case with the potential to be slow in a fairly cheap manner.  These cases weren’t weeded out by the v1 <= v2 <= v3 style checks because the median of 7 that I use doesn’t check the end elements.

One of the last optimizations I performed was converting it to use a stack rather than operating recursively.  To be honest, this provided more speedup than I was expecting.  I guess all the function calls were taking a toll on the performance that I just didn’t realize at the time.

Another implementation technique that I ran tests to figure out which was better was whether to do insertion sort at the end on the entire list, or to do it as I find sections that are smaller than the threshold value.  After running benchmarks it turned out that putting it at the end was better.  I suspect that the extra function calls required for it to be done during the quicksort loop was more overhead than the potential gain of having more cache locality.

On with the benchmarks: (The numbers are time in seconds.  In parenthesis is speedup over Java.)

                          Java                 My Qsort
1245.repeat.1000.txt      1.6552300e-05 (1.00) 8.1874300e-06   (2.02 )
1245.repeat.10000.txt     0.00026284956 (1.00) 0.00012648017   (2.08 )
end.0.1000.txt            5.8904900e-06 (1.00) 1.7295600e-06   (3.41 )
end.0.10000.txt           8.3629030e-05 (1.00) 2.3148840e-05   (3.61 )
identical.1000.txt        3.5435500e-06 (1.00) 5.9168000e-07   (5.99 )
identical.10000.txt       3.7097050e-05 (1.00) 6.5003600e-06   (5.71 )
med.3.killer.1000.txt     1.0182050e-05 (1.00) 7.2132600e-06   (1.41 )
med.3.killer.10000.txt    0.00013968449 (1.00) 9.4941470e-05   (1.47 )
rand.dups.100.txt         1.2044400e-06 (1.00) 6.3194000e-07   (1.91 )
rand.dups.1000.txt        1.9847760e-05 (1.00) 1.0417630e-05   (1.91 )
rand.dups.10000.txt       0.00031415178 (1.00) 0.00020365385   (1.54 )      1.2328200e-06 (1.00) 8.1612000e-07   (1.51 )     1.9309830e-05 (1.00) 1.1890280e-05   (1.62 )    0.00027436851 (1.00) 0.00017424722   (1.57 )
rand.steps.1000.txt       1.6057600e-05 (1.00) 1.0023700e-05   (1.60 )
rand.steps.10000.txt      0.00019955369 (1.00) 0.00017004971   (1.17 )
rev.ends.1000.txt         1.2306600e-05 (1.00) 2.8749300e-06   (4.28 )
rev.ends.10000.txt        9.4499880e-05 (1.00) 2.7255840e-05   (3.47 )
rev.partial.1000.txt      1.7107210e-05 (1.00) 8.7564000e-06   (1.95 )
rev.partial.10000.txt     0.00024198949 (1.00) 0.00013045623   (1.85 )
rev.saw.1000.txt          1.6793840e-05 (1.00) 9.4294600e-06   (1.78 )
rev.saw.10000.txt         0.00025133096 (1.00) 0.00014296088   (1.76 )
reverse.1000.txt          1.4600270e-05 (1.00) 1.1565800e-06   (12.6 )
reverse.10000.txt         0.00020535965 (1.00) 1.3798890e-05   (14.9 )    6.5672500e-06 (1.00) 1.4837800e-06   (4.43 )   4.7335130e-05 (1.00) 7.2842900e-06   (6.50 )
seq.partial.1000.txt      1.5497830e-05 (1.00) 6.0515300e-06   (2.56 )
seq.partial.10000.txt     0.00022936435 (1.00) 9.1791120e-05   (2.50 )
seq.saw.1000.txt          1.1645670e-05 (1.00) 4.6621150e-05   (0.250)
seq.saw.10000.txt         0.00019771144 (1.00) 0.00011184647   (1.77 )
sequential.1000.txt       3.4216700e-06 (1.00) 5.8689000e-07   (5.83 )
sequential.10000.txt      3.7812430e-05 (1.00) 6.1648500e-06   (6.13 )

These benchmarks were taken by running the sorting algorithm on each dataset 10000 times (to warm up the JVM), and then running it on the data 10 times to time it.  I took the average of those 10 runs.  The only case where My Qsort was slower than the built-in Java Arrays.sort() was seq.saw.1000.txt.  I attribute this to noise.  I ran it again and got the following:

seq.saw.1000.txt          0.00013980571 (1.00) 0.00012667049   (1.10 )

Hopefully this makes JRuby’s sorting comparable to Ruby’s.  One note, while Java’s sort is stable, the quicksort I wrote is not.  All that means is that if multiple entries have the same value they may get sorted differently.

You can do whatever you’d like with the source code.  If you do find it useful some credit and/or a link to here would be nice.

Here are the test files: Test Files

Here is the sort itself: Sorting Algorithm

Posted on November 15, 2008 at 1:31 pm by Joe · Permalink · 3 Comments
In: Java, Ruby · Tagged with: , , ,

Ruby Shootout: Fasta

Lately, I’ve been looking at the shootout Ruby benchmarks. I’d gotten into a habit of checking them every few months, rooting for my favorite languages. Not really understanding why some didn’t have the greatest showing on there. When the people running it upgraded their hardware, it seems as though Ruby fell off the list. While Ruby is a slow language, it deserves its spot (even if it is the bottom). I looked at the submissions and saw a post saying if more Ruby benchmarks were updated it would be included again.

I’ve since updated 3 of the benchmarks; I sped up reverse-compliment to be 2x faster than the previous. I’m not a Ruby expert by any means, but I was able to get a speedup. I don’t know if the other languages are in the same state, but they may be…

Anyway, the Fasta benchmark is the one that I most recently updated. It took me a while to get a decent speedup. It turned out the Array#find is slower than Array#each with a break statement. Making this switch is what got the biggest speedup out of the program.

I wrote a benchmark showing the difference in runtime:

require 'benchmark'
N = 300
table = (0..300).to_a
Benchmark.bmbm(8) do |x|"Array.find") {
    N.times do
      table.each do |find_num|
	table.find do |num|
	  num > find_num
  }"Array.each") {
    N.times do
      table.each do |find_num|
	output = nil
	table.each do |num|
	  if num > find_num then
	    output = num

The results:

Ruby 1.8.6       user     system      total        real
Array.find  12.120000   0.020000  12.140000 ( 14.647705)
Array.each   8.150000   0.030000   8.180000 (  9.734463)

JRuby 1.1.4      user     system      total        real
Array.find   6.738000   0.000000   6.738000 (  6.738407)
Array.each   5.365000   0.000000   5.365000 (  5.364320)

Now, I’m running this on an admittedly dated machine. Again, I’m nowhere near a ruby expert, so if someone sees a way to make the benchmark more fair let me know, and I’ll update it. I can’t say anything about how this would run in YARV when that is released.

Here is a link to the Ruby Fasta benchmark on the Language Shootout:

Ruby Shootout: Fasta

Posted on November 2, 2008 at 8:39 am by Joe · Permalink · 3 Comments
In: Ruby · Tagged with: , ,