<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Blog::Quibb</title>
	<atom:link href="http://blog.quibb.org/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.quibb.org</link>
	<description>Software development and more.</description>
	<lastBuildDate>Tue, 10 Aug 2010 14:11:56 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Fast Bulk Inserts into SQLite</title>
		<link>http://blog.quibb.org/2010/08/fast-bulk-inserts-into-sqlite/</link>
		<comments>http://blog.quibb.org/2010/08/fast-bulk-inserts-into-sqlite/#comments</comments>
		<pubDate>Tue, 10 Aug 2010 14:11:56 +0000</pubDate>
		<dc:creator>Joe</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[benchmarks]]></category>
		<category><![CDATA[c++]]></category>
		<category><![CDATA[optimization]]></category>
		<category><![CDATA[sqlite]]></category>

		<guid isPermaLink="false">http://blog.quibb.org/?p=219</guid>
		<description><![CDATA[Background Sometimes it’s necessary to get information into a database quickly. SQLite is a light weight database engine that can be easily embedded in applications. This will cover the process of optimizing bulk inserts into an SQLite database. While this article focuses on SQLite some of the techniques shown here will apply to other databases. [...]]]></description>
			<content:encoded><![CDATA[<h2>Background</h2>
<p>Sometimes it’s necessary to get information into a database quickly. <a title="SQLite" href="http://sqlite.org/"> SQLite</a> is a light weight database engine that can be easily embedded in applications.  This will cover the process of optimizing bulk inserts into an SQLite database.  While this article focuses on SQLite some of the techniques shown here will apply to other databases.</p>
<p>All of the following examples insert data into the same table.  It&#8217;s a table where an ID is the first element followed by three FLOAT values, and then follow by three INTEGER values.  You&#8217;ll notice the getDouble() and getInt() functions.  They return doubles and ints in a predictable manner.  I didn&#8217;t use random data because different values could potentially add variability to the benchmarks at the end.</p>
<h2>Naive Inserts</h2>
<p>This is the most basic way to insert information into SQLite.  It simply calls <a title="sqlite3_exec" href="http://www.sqlite.org/c3ref/exec.html">sqlite3_exec</a> for each insert in the database.</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">char</span> buffer<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">300</span><span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
<span style="color: #0000ff;">for</span> <span style="color: #008000;">&#40;</span><span style="color: #0000ff;">unsigned</span> i <span style="color: #000080;">=</span> <span style="color: #0000dd;">0</span><span style="color: #008080;">;</span> i <span style="color: #000080;">&lt;</span> mVal<span style="color: #008080;">;</span> i<span style="color: #000040;">++</span><span style="color: #008000;">&#41;</span>
<span style="color: #008000;">&#123;</span>
    <span style="color: #0000dd;">sprintf</span><span style="color: #008000;">&#40;</span>buffer, <span style="color: #FF0000;">&quot;INSERT INTO example VALUES ('%s', %lf, %lf, %lf, %d, %d, %d)&quot;</span>,
            getID<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>.<span style="color: #007788;">c_str</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>, getDouble<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>, getDouble<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>, getDouble<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>,
            getInt<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>, getInt<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>, getInt<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    sqlite3_exec<span style="color: #008000;">&#40;</span>mDb, buffer, <span style="color: #0000ff;">NULL</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #0000ff;">NULL</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span></pre></div></div>

<h2>Inserts within a Transaction</h2>
<p>A transaction is a way to group SQL statements together.  If an error is encountered the ON CONFLICT statement can be used to handle that to your liking.  Nothing will be written to the SQLite database until either END or COMMIT is encountered to signify the transaction should be closed and written.</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">char</span><span style="color: #000040;">*</span> errorMessage<span style="color: #008080;">;</span>
sqlite3_exec<span style="color: #008000;">&#40;</span>mDb, <span style="color: #FF0000;">&quot;BEGIN TRANSACTION&quot;</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #000040;">&amp;</span>errorMessage<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
<span style="color: #0000ff;">char</span> buffer<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">300</span><span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
<span style="color: #0000ff;">for</span> <span style="color: #008000;">&#40;</span><span style="color: #0000ff;">unsigned</span> i <span style="color: #000080;">=</span> <span style="color: #0000dd;">0</span><span style="color: #008080;">;</span> i <span style="color: #000080;">&lt;</span> mVal<span style="color: #008080;">;</span> i<span style="color: #000040;">++</span><span style="color: #008000;">&#41;</span>
<span style="color: #008000;">&#123;</span>
    <span style="color: #0000dd;">sprintf</span><span style="color: #008000;">&#40;</span>buffer, <span style="color: #FF0000;">&quot;INSERT INTO example VALUES ('%s', %lf, %lf, %lf, %d, %d, %d)&quot;</span>,
            getID<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>.<span style="color: #007788;">c_str</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>, getDouble<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>, getDouble<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>, getDouble<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>,
            getInt<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>, getInt<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>, getInt<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    sqlite3_exec<span style="color: #008000;">&#40;</span>mDb, buffer, <span style="color: #0000ff;">NULL</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #0000ff;">NULL</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span>
&nbsp;
sqlite3_exec<span style="color: #008000;">&#40;</span>mDb, <span style="color: #FF0000;">&quot;COMMIT TRANSACTION&quot;</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #000040;">&amp;</span>errorMessage<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span></pre></div></div>

<h2>PRAGMA Statements</h2>
<p><a title="PRAGMA" href="http://sqlite.org/pragma.html">PRAGMA</a> statements control the behavior of SQLite as a whole.  They can be used to tweak options such as how often the data is flushed to disk of the size of the cache.  These are some that are commonly used for performance.  The SQLite documentation fully explains what they do and the implications of using them.  For example, synchronous off will cause SQLite to not stop and wait for the data to get written to the hard drive.  In the event of a crash or power failure, it is more likely the database could be corrupted.</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;">sqlite3_exec<span style="color: #008000;">&#40;</span>mDb, <span style="color: #FF0000;">&quot;PRAGMA synchronous=OFF&quot;</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #000040;">&amp;</span>errorMessage<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
sqlite3_exec<span style="color: #008000;">&#40;</span>mDb, <span style="color: #FF0000;">&quot;PRAGMA count_changes=OFF&quot;</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #000040;">&amp;</span>errorMessage<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
sqlite3_exec<span style="color: #008000;">&#40;</span>mDb, <span style="color: #FF0000;">&quot;PRAGMA journal_mode=MEMORY&quot;</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #000040;">&amp;</span>errorMessage<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
sqlite3_exec<span style="color: #008000;">&#40;</span>mDb, <span style="color: #FF0000;">&quot;PRAGMA temp_store=MEMORY&quot;</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #000040;">&amp;</span>errorMessage<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span></pre></div></div>

<h2>Prepared Statements</h2>
<p><a title="Prepared Statements" href="http://sqlite.org/c3ref/prepare.html">Prepared statements</a> are the recommended way of sending queries to SQLite.   Rather than parsing the statement over and over again, the parser only needs to be run once on the statement.  According to the documentation, sqlite3_exec is a convenience function  that calls sqlite3_prepare_v2(), sqlite3_step(), and then  sqlite3_finalize().  In my opinion, the documentation should more explicitly say that prepared statements are the preferred query method.  sqlite3_exec() should only be used for one time use queries.</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">char</span><span style="color: #000040;">*</span> errorMessage<span style="color: #008080;">;</span>
sqlite3_exec<span style="color: #008000;">&#40;</span>mDb, <span style="color: #FF0000;">&quot;BEGIN TRANSACTION&quot;</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #000040;">&amp;</span>errorMessage<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
<span style="color: #0000ff;">char</span> buffer<span style="color: #008000;">&#91;</span><span style="color: #008000;">&#93;</span> <span style="color: #000080;">=</span> <span style="color: #FF0000;">&quot;INSERT INTO example VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7)&quot;</span><span style="color: #008080;">;</span>
sqlite3_stmt<span style="color: #000040;">*</span> stmt<span style="color: #008080;">;</span>
sqlite3_prepare_v2<span style="color: #008000;">&#40;</span>mDb, buffer, <span style="color: #0000dd;">strlen</span><span style="color: #008000;">&#40;</span>buffer<span style="color: #008000;">&#41;</span>, <span style="color: #000040;">&amp;</span>stmt, <span style="color: #0000ff;">NULL</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
<span style="color: #0000ff;">for</span> <span style="color: #008000;">&#40;</span><span style="color: #0000ff;">unsigned</span> i <span style="color: #000080;">=</span> <span style="color: #0000dd;">0</span><span style="color: #008080;">;</span> i <span style="color: #000080;">&lt;</span> mVal<span style="color: #008080;">;</span> i<span style="color: #000040;">++</span><span style="color: #008000;">&#41;</span>
<span style="color: #008000;">&#123;</span>
    std<span style="color: #008080;">::</span><span style="color: #007788;">string</span> id <span style="color: #000080;">=</span> getID<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    sqlite3_bind_text<span style="color: #008000;">&#40;</span>stmt, <span style="color: #0000dd;">1</span>, id.<span style="color: #007788;">c_str</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>, id.<span style="color: #007788;">size</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>, SQLITE_STATIC<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    sqlite3_bind_double<span style="color: #008000;">&#40;</span>stmt, <span style="color: #0000dd;">2</span>, getDouble<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    sqlite3_bind_double<span style="color: #008000;">&#40;</span>stmt, <span style="color: #0000dd;">3</span>, getDouble<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    sqlite3_bind_double<span style="color: #008000;">&#40;</span>stmt, <span style="color: #0000dd;">4</span>, getDouble<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    sqlite3_bind_int<span style="color: #008000;">&#40;</span>stmt, <span style="color: #0000dd;">5</span>, getInt<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    sqlite3_bind_int<span style="color: #008000;">&#40;</span>stmt, <span style="color: #0000dd;">6</span>, getInt<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    sqlite3_bind_int<span style="color: #008000;">&#40;</span>stmt, <span style="color: #0000dd;">7</span>, getInt<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
    <span style="color: #0000ff;">if</span> <span style="color: #008000;">&#40;</span>sqlite3_step<span style="color: #008000;">&#40;</span>stmt<span style="color: #008000;">&#41;</span> <span style="color: #000040;">!</span><span style="color: #000080;">=</span> SQLITE_DONE<span style="color: #008000;">&#41;</span>
    <span style="color: #008000;">&#123;</span>
        <span style="color: #0000dd;">printf</span><span style="color: #008000;">&#40;</span><span style="color: #FF0000;">&quot;Commit Failed!<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    <span style="color: #008000;">&#125;</span>
&nbsp;
    sqlite3_reset<span style="color: #008000;">&#40;</span>stmt<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span>
&nbsp;
sqlite3_exec<span style="color: #008000;">&#40;</span>mDb, <span style="color: #FF0000;">&quot;COMMIT TRANSACTION&quot;</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #000040;">&amp;</span>errorMessage<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
sqlite3_finalize<span style="color: #008000;">&#40;</span>stmt<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span></pre></div></div>

<h2>Storing Data as Binary Blob</h2>
<p>Up until now, most of the optimizations have been pretty much the standard advice that you get when looking into bulk insert optimization.  If you’re not running queries on some of the data, it’s possible to convert it to binary and store it as a blob.  While it’s not advised to just throw everything into a blob and put it in the database, putting data that would be pulled and used together into a binary blob can make sense in some situations.</p>
<p>For example, if you have a point class (x, y, z) with REAL values, it might make sense to store them in a blob rather than three separate fields in row.  That’s only if you don’t need to make queries on the data though.  The benefit of this technique increases as more fields are converted into larger blobs.</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">char</span><span style="color: #000040;">*</span> errorMessage<span style="color: #008080;">;</span>
sqlite3_exec<span style="color: #008000;">&#40;</span>mDb, <span style="color: #FF0000;">&quot;BEGIN TRANSACTION&quot;</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #000040;">&amp;</span>errorMessage<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
<span style="color: #0000ff;">char</span> buffer<span style="color: #008000;">&#91;</span><span style="color: #008000;">&#93;</span> <span style="color: #000080;">=</span> <span style="color: #FF0000;">&quot;INSERT INTO example VALUES (?1, ?2, ?3, ?4, ?5)&quot;</span><span style="color: #008080;">;</span>
sqlite3_stmt<span style="color: #000040;">*</span> stmt<span style="color: #008080;">;</span>
sqlite3_prepare_v2<span style="color: #008000;">&#40;</span>mDb, buffer, <span style="color: #0000dd;">strlen</span><span style="color: #008000;">&#40;</span>buffer<span style="color: #008000;">&#41;</span>, <span style="color: #000040;">&amp;</span>stmt, <span style="color: #0000ff;">NULL</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
<span style="color: #0000ff;">for</span> <span style="color: #008000;">&#40;</span><span style="color: #0000ff;">unsigned</span> i <span style="color: #000080;">=</span> <span style="color: #0000dd;">0</span><span style="color: #008080;">;</span> i <span style="color: #000080;">&lt;</span> mVal<span style="color: #008080;">;</span> i<span style="color: #000040;">++</span><span style="color: #008000;">&#41;</span>
<span style="color: #008000;">&#123;</span>
    std<span style="color: #008080;">::</span><span style="color: #007788;">string</span> id <span style="color: #000080;">=</span> getID<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    sqlite3_bind_text<span style="color: #008000;">&#40;</span>stmt, <span style="color: #0000dd;">1</span>, id.<span style="color: #007788;">c_str</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>, id.<span style="color: #007788;">size</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>, SQLITE_STATIC<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
    <span style="color: #0000ff;">char</span> dblBuffer<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">24</span><span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
    <span style="color: #0000ff;">double</span> d<span style="color: #008000;">&#91;</span><span style="color: #008000;">&#93;</span> <span style="color: #000080;">=</span> <span style="color: #008000;">&#123;</span>getDouble<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>, getDouble<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>, getDouble<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span>
    <span style="color: #0000dd;">memcpy</span><span style="color: #008000;">&#40;</span>dblBuffer, <span style="color: #008000;">&#40;</span><span style="color: #0000ff;">char</span><span style="color: #000040;">*</span><span style="color: #008000;">&#41;</span><span style="color: #000040;">&amp;</span>d, <span style="color: #0000dd;">sizeof</span><span style="color: #008000;">&#40;</span>d<span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    sqlite3_bind_blob<span style="color: #008000;">&#40;</span>stmt, <span style="color: #0000dd;">2</span>, dblBuffer, <span style="color: #0000dd;">24</span>, SQLITE_STATIC<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    sqlite3_bind_int<span style="color: #008000;">&#40;</span>stmt, <span style="color: #0000dd;">3</span>, getInt<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    sqlite3_bind_int<span style="color: #008000;">&#40;</span>stmt, <span style="color: #0000dd;">4</span>, getInt<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    sqlite3_bind_int<span style="color: #008000;">&#40;</span>stmt, <span style="color: #0000dd;">5</span>, getInt<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
    <span style="color: #0000ff;">int</span> retVal <span style="color: #000080;">=</span> sqlite3_step<span style="color: #008000;">&#40;</span>stmt<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    <span style="color: #0000ff;">if</span> <span style="color: #008000;">&#40;</span>retVal <span style="color: #000040;">!</span><span style="color: #000080;">=</span> SQLITE_DONE<span style="color: #008000;">&#41;</span>
    <span style="color: #008000;">&#123;</span>
        <span style="color: #0000dd;">printf</span><span style="color: #008000;">&#40;</span><span style="color: #FF0000;">&quot;Commit Failed! %d<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span>, retVal<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    <span style="color: #008000;">&#125;</span>
&nbsp;
    sqlite3_reset<span style="color: #008000;">&#40;</span>stmt<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span>
&nbsp;
sqlite3_exec<span style="color: #008000;">&#40;</span>mDb, <span style="color: #FF0000;">&quot;COMMIT TRANSACTION&quot;</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #000040;">&amp;</span>errorMessage<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
sqlite3_finalize<span style="color: #008000;">&#40;</span>stmt<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span></pre></div></div>

<p>Note: I just used memcpy here, but this would have issues going between big and little endian systems.  If that’s necessary, it would be a good idea to serialize the data using a serialization library (ie &#8211; <a title="protocol buffers" href="http://code.google.com/apis/protocolbuffers/docs/overview.html">protocol buffers</a> or <a title="MsgPack" href="http://msgpack.org/">msgpack</a>).</p>
<h2>Performance</h2>
<p>I ran benchmarks to test the performance of each method of inserting data.  Take note that the x axis does not scale linearly, it most closely matches a logarithmic scale.  The inserts per second graph was obtained by taking the number of inserts and dividing it by the total runtime.</p>
<div id="attachment_238" class="wp-caption aligncenter" style="width: 310px"><a href="http://blog.quibb.org/wp-content/uploads/2010/07/bulk_insert_runtime.png"><img class="size-medium wp-image-238 " title="SQLite Bulk Insert Runtime" src="http://blog.quibb.org/wp-content/uploads/2010/07/bulk_insert_runtime-300x182.png" alt="SQLite Bulk Insert Runtime" width="300" height="182" /></a><p class="wp-caption-text">SQLite Build Insert Runtime in Seconds</p></div>
<p style="text-align: center;">
<div id="attachment_239" class="wp-caption aligncenter" style="width: 310px"><a href="http://blog.quibb.org/wp-content/uploads/2010/07/inserts_per_second.png"><img class="size-medium wp-image-239" title="Inserts Per Second" src="http://blog.quibb.org/wp-content/uploads/2010/07/inserts_per_second-300x182.png" alt="Inserts Per Second" width="300" height="182" /></a><p class="wp-caption-text">SQLite Inserts Per Second</p></div>
<p style="text-align: center;"><a href="http://blog.quibb.org/wp-content/uploads/2010/07/inserts_per_second.png"></a></p>
<p style="text-align: left;">After running the first benchmark, I wanted to show how storing data in binary can make a difference.  I ran it again, but instead of storing only three doubles, I stored 24 doubles.  I assumed order mattered, so for the benchmark that is not stored in a binary blob, I made a separate table with ID and order columns.  This way both versions captured the same information.</p>
<p style="text-align: left;">
<div id="attachment_242" class="wp-caption aligncenter" style="width: 310px"><a href="http://blog.quibb.org/wp-content/uploads/2010/07/big_insert_runtime.png"><img class="size-medium wp-image-242" title="Big Insert Runtime" src="http://blog.quibb.org/wp-content/uploads/2010/07/big_insert_runtime-300x182.png" alt="Big Insert Runtime" width="300" height="182" /></a><p class="wp-caption-text">Big Insert Runtime in Seconds</p></div>
<div id="attachment_244" class="wp-caption aligncenter" style="width: 310px"><a href="http://blog.quibb.org/wp-content/uploads/2010/07/big_insert_per_second.png"><img class="size-medium wp-image-244" title="Big Inserts Per Second" src="http://blog.quibb.org/wp-content/uploads/2010/07/big_insert_per_second-300x181.png" alt="Big Inserts Per Second" width="300" height="181" /></a><p class="wp-caption-text">Big Inserts Per Second</p></div>
<p>Good luck with your database inserts.</p>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow: hidden;">
<h1 id="internal-source-marker_0.4793873936321398"><span style="font-size: 24pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: bold; font-style: normal; text-decoration: none; vertical-align: baseline;">Fast Bulk Inserts into SQLite</span></h1>
<h2><span style="font-size: 18pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: bold; font-style: normal; text-decoration: none; vertical-align: baseline;">Background</span></h2>
<p><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none; vertical-align: baseline;">Sometimes it’s necessary to get information into a database quickly.  SQLite[</span><a href="http://sqlite.org/"><span style="font-size: 11pt; font-family: Arial; color: #000099; background-color: transparent; font-weight: normal; font-style: normal; vertical-align: baseline; text-decoration: underline;">http://sqlite.org/</span></a><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none; vertical-align: baseline;">]  is a light weight database engine that can be easily embedded in  applications.  This will cover the process of optimizing bulk inserts  into an SQLite database.  While this article focuses on SQLite some of  the techniques shown here will apply to other databases.</span></p>
<h2><span style="font-size: 18pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: bold; font-style: normal; text-decoration: none; vertical-align: baseline;">Naive Inserts</span></h2>
<p><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none; vertical-align: baseline;">This is the most basic way to insert information into SQLite.  It simply calls sqlite3_exec[</span><a href="http://www.sqlite.org/c3ref/exec.html"><span style="font-size: 11pt; font-family: Arial; color: #000099; background-color: transparent; font-weight: normal; font-style: normal; vertical-align: baseline; text-decoration: underline;">http://www.sqlite.org/c3ref/exec.html</span></a><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none; vertical-align: baseline;">] for each insert in the database.</span><br />
<span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none; vertical-align: baseline;">[insert code here]</span></p>
<p><span style="font-size: 18pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: bold; font-style: normal; text-decoration: none; vertical-align: baseline;">Inserts within a Transaction</span><br />
<span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none; vertical-align: baseline;">A  transaction is a way to group SQL statements together.  If an error is  encountered the ON CONFLICT statement can be used to handle that to your  liking.  Nothing will be written to the SQLite database until either  END or COMMIT is encountered to signify the transaction should be  written and closed.</span><br />
<span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none; vertical-align: baseline;">[insert code here]</span></p>
<h2><span style="font-size: 18pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: bold; font-style: normal; text-decoration: none; vertical-align: baseline;">PRAGMA Statements</span></h2>
<p><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none; vertical-align: baseline;">PRAGMA statements[</span><a href="http://sqlite.org/pragma.html"><span style="font-size: 11pt; font-family: Arial; color: #000099; background-color: transparent; font-weight: normal; font-style: normal; vertical-align: baseline; text-decoration: underline;">http://sqlite.org/pragma.html</span></a><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none; vertical-align: baseline;">]  control the behavior of SQLite as a whole.  They can be used to tweak  options such as how often the data is flushed to disk of the size of the  cache.</span><br />
<span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none; vertical-align: baseline;">[insert code here]</span></p>
<h2><span style="font-size: 18pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: bold; font-style: normal; text-decoration: none; vertical-align: baseline;">Prepared Statements</span></h2>
<p><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none; vertical-align: baseline;">Prepared statements[</span><a href="http://sqlite.org/c3ref/prepare.html"><span style="font-size: 11pt; font-family: Arial; color: #000099; background-color: transparent; font-weight: normal; font-style: normal; vertical-align: baseline; text-decoration: underline;">http://sqlite.org/c3ref/prepare.html</span></a><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none; vertical-align: baseline;">]  are the recommended way of sending queries to SQLite.  Rather than  parsing the statement over and over again, the parser only needs to be  run once on the statement.  In all honesty, the documentation for  sqlite3_exec should say not to use it at all in favor of prepared  statements.  They are not only faster on inserts, but across the board  for all SQL statements.</span><br />
<span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none; vertical-align: baseline;">[insert cod here]</span></p>
<h2><span style="font-size: 18pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: bold; font-style: normal; text-decoration: none; vertical-align: baseline;">Storing Data as Binary Blob</span></h2>
<p><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none; vertical-align: baseline;">Up  until now, most of the optimizations have been pretty much the standard  advice that you get when looking into bulk insert optimization.  If  you’re not running queries on some of the data, it’s possible to convert  it to binary and store it as a blob.  While it’s not advised to just  throw everything into a blob and put it in the database, putting data  that would be pulled and used together into a binary blob can make sense  in some situations.</span><br />
<span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none; vertical-align: baseline;">For  example, if you have a point class (x, y, z) with REAL values, it might  make sense to store them in a blob rather than three separate fields in  row.  That’s only if you don’t need to make queries on the data though.   The benefits of this technique increase as more fields are converted  into larger blobs.</span><br />
<span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none; vertical-align: baseline;">[insert code here]</span><br />
<span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none; vertical-align: baseline;">Note:  I just do a memcpy here, but this would have issues going between big  and little endian systems.  If that’s necessary, it would be a good idea  to serialize the data using a serialization library (ie &#8211; protocol  buffers[http://code.google.com/apis/protocolbuffers/docs/overview.html],  msgpack[http://msgpack.org/],  thrift[http://incubator.apache.org/thrift/]).</span></p>
<h2><span style="font-size: 18pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: bold; font-style: normal; text-decoration: none; vertical-align: baseline;">Performance</span></h2>
<p>Fast Bulk Inserts into SQLite</p>
<p>Background</p>
<p>Sometimes it’s necessary to get information into a database quickly.  SQLite[http://sqlite.org/] is a light weight database engine that can be easily embedded in applications.  This will cover the process of optimizing bulk inserts into an SQLite database.  While this article focuses on SQLite some of the techniques shown here will apply to other databases.</p>
<p>Naive Inserts</p>
<p>This is the most basic way to insert information into SQLite.  It simply calls sqlite3_exec[http://www.sqlite.org/c3ref/exec.html] for each insert in the database.</p>
<p>[insert code here]</p>
<p>Inserts within a Transaction</p>
<p>A transaction is a way to group SQL statements together.  If an error is encountered the ON CONFLICT statement can be used to handle that to your liking.  Nothing will be written to the SQLite database until either END or COMMIT is encountered to signify the transaction should be written and closed.</p>
<p>[insert code here]</p>
<p>PRAGMA Statements</p>
<p>PRAGMA statements[http://sqlite.org/pragma.html] control the behavior of SQLite as a whole.  They can be used to tweak options such as how often the data is flushed to disk of the size of the cache.</p>
<p>[insert code here]</p>
<p>Prepared Statements</p>
<p>Prepared statements[http://sqlite.org/c3ref/prepare.html] are the recommended way of sending queries to SQLite.  Rather than parsing the statement over and over again, the parser only needs to be run once on the statement.  In all honesty, the documentation for sqlite3_exec should say not to use it at all in favor of prepared statements.  They are not only faster on inserts, but across the board for all SQL statements.</p>
<p>[insert cod here]</p>
<p>Storing Data as Binary Blob</p>
<p>Up until now, most of the optimizations have been pretty much the standard advice that you get when looking into bulk insert optimization.  If you’re not running queries on some of the data, it’s possible to convert it to binary and store it as a blob.  While it’s not advised to just throw everything into a blob and put it in the database, putting data that would be pulled and used together into a binary blob can make sense in some situations.</p>
<p>For example, if you have a point class (x, y, z) with REAL values, it might make sense to store them in a blob rather than three separate fields in row.  That’s only if you don’t need to make queries on the data though.  The benefits of this technique increase as more fields are converted into larger blobs.</p>
<p>[insert code here]</p>
<p>Note: I just do a memcpy here, but this would have issues going between big and little endian systems.  If that’s necessary, it would be a good idea to serialize the data using a serialization library (ie &#8211; protocol buffers[http://code.google.com/apis/protocolbuffers/docs/overview.html], msgpack[http://msgpack.org/], thrift[http://incubator.apache.org/thrift/]).</p>
<p>Performance</p>
</div>
<p><span id="more-219"></span></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.quibb.org/2010/08/fast-bulk-inserts-into-sqlite/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Nightly Benchmarks: Tracking Results with Codespeed</title>
		<link>http://blog.quibb.org/2010/07/nightly-benchmarks-tracking-results-with-codespeed/</link>
		<comments>http://blog.quibb.org/2010/07/nightly-benchmarks-tracking-results-with-codespeed/#comments</comments>
		<pubDate>Mon, 19 Jul 2010 14:16:54 +0000</pubDate>
		<dc:creator>Joe</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[benchmarks]]></category>
		<category><![CDATA[continuous integration]]></category>
		<category><![CDATA[jruby]]></category>

		<guid isPermaLink="false">http://blog.quibb.org/?p=204</guid>
		<description><![CDATA[Background Codespeed is a project for tracking performance. I discovered it when the PyPy project started using Codespeed to track performance. Since then development has been done to make its setup easier and provide more display options. Anyway, two posts ago I talked about running nightly benchmarks with Hudson. Then in the previous post I [...]]]></description>
			<content:encoded><![CDATA[<h2>Background</h2>
<p><a title="Codespeed" href="http://github.com/tobami/codespeed">Codespeed</a> is a project for tracking performance.  I discovered it when the <a title="PyPy" href="http://pypy.org/">PyPy</a> project started using Codespeed to track performance.  Since then development has been done to make its setup easier and provide more display options.</p>
<p>Anyway, two posts ago I talked about <a title="running nightly benchmarks with Hudson" href="http://blog.quibb.org/2010/04/nightly-benchmarks-setting-up-hudson/">running nightly benchmarks with Hudson</a>.  Then in the previous post I discussed <a title="passing parameters between builds in Hudson" href="http://blog.quibb.org/2010/04/passing-parameters-between-builds-in-hudson/">passing parameters between builds in Hudson</a>.  Both of these posts are worth reading before trying to setup Hudson with Codespeed.</p>
<h2>Codespeed Installation/Configuration</h2>
<h3>Django Quickstart</h3>
<p>Codespeed is built on Python and <a title="Django" href="http://www.djangoproject.com/">Django</a>.  Some basic knowledge of Django is needed in order to get everything up and running.  Don&#8217;t worry, it&#8217;s not that hard to learn the bit that is needed.  <a title="manage.py" href="http://docs.djangoproject.com/en/1.2/ref/django-admin/#ref-django-admin">manage.py</a> is all you need to know about to setup and view Codespeed.  There is information about <a title="deploying Django to a real web server" href="http://docs.djangoproject.com/en/1.2/howto/deployment/#howto-deployment-index">deploying Django to a real web server</a>, but I won&#8217;t be covering that here.</p>
<p>Here are the commands to get Django running:</p>
<p><a title="syncdb" href="http://docs.djangoproject.com/en/1.2/ref/django-admin/#syncdb"><strong>syncdb</strong></a></p>
<p>syncdb is used to initialize the database with the necessary tables.  It will also setup an admin account.  With the sqlite3 database selected, it will create the database file when this command is run.</p>
<p>The command is:</p>
<pre>python manage.py syncdb</pre>
<p><a title="runserver" href="http://docs.djangoproject.com/en/1.2/ref/django-admin/#runserver-port-or-ipaddr-port"><strong>runserver</strong></a></p>
<p>The next command is the runserver command.  This runs the built-in django server.  In the documentation they state you&#8217;re not supposed to use it in a production environment, so make sure to deploy to a production environment if you plan to host it on the Internet or high traffic network.</p>
<p>The command is:</p>
<pre>python manage.py runserver 0.0.0.0:9000</pre>
<p>By default the server will run on 127.0.0.1:8000.  Setting the IP to 0.0.0.0 allows connections from any computer.  This works well if you&#8217;re on a local area network and want to set it up on a VM over SSH, but still be able to access the web interface from your computer.  The port is the port for the server to run on.  To view Codespeed, point your browser at 127.0.0.1:9000 or the IP of the machine it&#8217;s on with the colon 9000.</p>
<p>Django has many <a title="Django settings" href="http://docs.djangoproject.com/en/1.2/ref/settings/#ref-settings">settings</a> that may or may not need to be tweaked for your environment.  They can be set through the <a title="speedcenter/settings.py" href="http://github.com/tobami/codespeed/blob/0.6.1/speedcenter/settings.py">speedcenter/settings.py</a> file.</p>
<h3>Codespeed Setup/Settings</h3>
<p>Now for setting up the actual Codespeed server.  First check it out using git.  The clone command is:</p>
<pre>git clone http://github.com/tobami/codespeed.git</pre>
<p>The settings file is <a title="speedcenter/codespeed/settings.py" href="http://github.com/tobami/codespeed/blob/0.6.1/speedcenter/codespeed/settings.py">speedcenter/codespeed/settings.py</a>.</p>
<p>Most of the default values will work fine.  They&#8217;re mostly for setting default values for various things in the interface.</p>
<p>One thing that does need to be configured is the environment.  Start by running the syncdb command and then run the server using runserver.  Now that the server is running, browse to the admin interface.  If you ran the server on port 9000, point your browser at http://127.0.0.1:9000/admin.  Login using the username and password you created during the syncdb call.  A Codespeed environment must be created manually.  The environment is the machine you&#8217;re running the benchmarks on.  After logging in, click Add next to the Environment label.  Fill in the various fields and remember the name of it.  Save it when you&#8217;re done.  The name will be used later when submitting benchmark data to Codespeed.</p>
<h2>Submitting Benchmarks</h2>
<p>This will pick up where my last <a title="Nightly Benchmarks: Setting up Hudson" href="http://blog.quibb.org/2010/04/nightly-benchmarks-setting-up-hudson/">tutorial</a> left off.  The benchmarks were running as a nightly job in Hudson.  Sending benchmark data to Codespeed will take a bit of programming.  I&#8217;m going to continue the example with <a title="JRuby" href="http://jruby.org/">JRuby</a>, so the benchmarks and submission process are written in Ruby.</p>
<p>In order to submit benchmarks information must be transferred from the JRuby build job to the Ruby Benchmarks job.  My last post discussed how to transfer parameters between jobs.  Using the <a title="Parameterized Trigger Plugin" href="http://wiki.hudson-ci.org/display/HUDSON/Parameterized+Trigger+Plugin">Parameterized Trigger Plugin</a> and passing extra parameters using a properties file will allow you to get all the necessary parameters to the benchmarks job.</p>
<p>The required information for submitting a benchmark result to Codespeed includes:</p>
<ul>
<li>commitid &#8211; The id of the commit, which could either be a git/mercurial hashcode or an svn revision number.</li>
<li>project &#8211; The name of the project to save.</li>
<li>executable &#8211; The name of the executable.</li>
<li>benchmark &#8211; The name of the benchmark.</li>
<li>environment &#8211; This is the name of the environment you created earlier.  It must be the name of an existing environment.</li>
<li>result_value &#8211; The runtime of the benchmark. You can configure what units a benchmark has through the admin interface. Default is seconds.</li>
</ul>
<p>This information can be included but is optional:</p>
<ul>
<li>std_dev &#8211; The standard deviation of the results of the benchmarks.</li>
<li>min</li>
<li>max</li>
<li>branch &#8211; The branch corresponding to this benchmark in the SCM repository.</li>
<li>result_date &#8211; The timestamp of the commit in the form &#8220;%Y-%m-%d %H:%M&#8221;</li>
</ul>
<p>The above information is passed to Codespeed through an encoded URL.  Have the URL point to http://127.0.0.1:9000/results/add/ and encode the parameters for sending.  For the JRuby benchmarks, the following parameters are sent from the JRuby job to the to the ruby benchmarks job.</p>
<pre>COMMIT_ID=$(git rev-parse HEAD)
COMMIT_TIME=$(git log -1 --pretty=\"format:%ad\")
RUBY_PATH=$WORKSPACE/bin/jruby
REPO_URL=git://github.com/jruby/jruby.git</pre>
<p>The other fields are derived from the benchmarks job itself.</p>
<p>Here is the source code for <a title="submission through Ruby rake file" href="http://github.com/qbproger/ruby-benchmark-suite/blob/master/rakelib/bench.rake">submission through Ruby</a>:</p>

<div class="wp_syntax"><div class="code"><pre class="ruby" style="font-family:monospace;">output = <span style="color:#006600; font-weight:bold;">&#123;</span><span style="color:#006600; font-weight:bold;">&#125;</span>
canonical_name = doc<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#996600;">&quot;name&quot;</span><span style="color:#006600; font-weight:bold;">&#93;</span>.<span style="color:#CC0066; font-weight:bold;">gsub</span> <span style="color:#996600;">'//'</span>, <span style="color:#996600;">'/'</span>
output<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#996600;">'commitid'</span><span style="color:#006600; font-weight:bold;">&#93;</span> = commitid
output<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#996600;">'project'</span><span style="color:#006600; font-weight:bold;">&#93;</span> = BASE_VM
output<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#996600;">'branch'</span><span style="color:#006600; font-weight:bold;">&#93;</span> = branch
output<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#996600;">'executable'</span><span style="color:#006600; font-weight:bold;">&#93;</span> = BASE_VM
output<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#996600;">'benchmark'</span><span style="color:#006600; font-weight:bold;">&#93;</span> = <span style="color:#CC00FF; font-weight:bold;">File</span>.<span style="color:#9900CC;">basename</span><span style="color:#006600; font-weight:bold;">&#40;</span>canonical_name<span style="color:#006600; font-weight:bold;">&#41;</span>
output<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#996600;">'environment'</span><span style="color:#006600; font-weight:bold;">&#93;</span> = environment
output<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#996600;">'result_value'</span><span style="color:#006600; font-weight:bold;">&#93;</span> = doc<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#996600;">&quot;mean&quot;</span><span style="color:#006600; font-weight:bold;">&#93;</span>
output<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#996600;">'std_dev'</span><span style="color:#006600; font-weight:bold;">&#93;</span> = doc<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#996600;">&quot;standard_deviation&quot;</span><span style="color:#006600; font-weight:bold;">&#93;</span>
output<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#996600;">'result_date'</span><span style="color:#006600; font-weight:bold;">&#93;</span> = commit_time
&nbsp;
res = <span style="color:#6666ff; font-weight:bold;">Net::HTTP</span>.<span style="color:#9900CC;">post_form</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#CC00FF; font-weight:bold;">URI</span>.<span style="color:#9900CC;">parse</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#996600;">&quot;#{server}/result/add/&quot;</span><span style="color:#006600; font-weight:bold;">&#41;</span>, output<span style="color:#006600; font-weight:bold;">&#41;</span>
<span style="color:#CC0066; font-weight:bold;">puts</span> res.<span style="color:#9900CC;">body</span></pre></div></div>

<p>It&#8217;s a good idea to always print out the response as it will contain debug information.  There is an example of <a title="save_single_result.py" href="http://github.com/tobami/codespeed/blob/0.6.1/tools/save_single_result.py">how to submit benchmarks to Codespeed using Python</a> in the Codespeed repository in the tools directory.</p>
<h2>Viewing Results</h2>
<p>After results are in the the Codespeed database, you can view the data through the web interface.  Direct a browser at http://127.0.0.1:9000.  The changes view shows the trend over the last revisions.  The timeline view allows you to see a graph of recent revisions, and the newly added comparison view will compare different executables running the same benchmark.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.quibb.org/2010/07/nightly-benchmarks-tracking-results-with-codespeed/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Passing Parameters Between Builds in Hudson</title>
		<link>http://blog.quibb.org/2010/04/passing-parameters-between-builds-in-hudson/</link>
		<comments>http://blog.quibb.org/2010/04/passing-parameters-between-builds-in-hudson/#comments</comments>
		<pubDate>Wed, 21 Apr 2010 03:06:23 +0000</pubDate>
		<dc:creator>Joe</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[continuous integration]]></category>
		<category><![CDATA[git]]></category>

		<guid isPermaLink="false">http://blog.quibb.org/?p=192</guid>
		<description><![CDATA[In my last post, I talked about setting up Hudson to run nightly benchmarks.  While trying to take that to the next step, and get nightly benchmarks recorded in a graph, I discovered that passing parameters between builds may not be as easy as it originally seemed.  If you&#8217;re using the Hudson Parameterized Trigger plugin, [...]]]></description>
			<content:encoded><![CDATA[<p>In my last post, I talked about setting up <a title="Hudson" href="http://hudson-ci.org/">Hudson</a> to run nightly benchmarks.  While trying to take that to the next step, and get nightly benchmarks recorded in a graph, I discovered that passing parameters between builds may not be as easy as it originally seemed.  If you&#8217;re using the Hudson <a title="Parameterized Trigger Plugin" href="http://wiki.hudson-ci.org/display/HUDSON/Parameterized+Trigger+Plugin">Parameterized Trigger plugin</a>, that gets you part of the way to passing parameters between builds, but I was left wanting more flexibility than it offered.</p>
<p>I wanted to set environment variables with an Execute Shell step, and then be able to pass them as parameters to the benchmarks build.  I wanted to pass the git commit id and timestamp to the benchmarks build for recording.  The <a title="Git SCM Plugin" href="http://wiki.hudson-ci.org/display/HUDSON/Git+Plugin">Git SCM Plugin</a> doesn&#8217;t provide that information to Hudson.  The Parameterized Trigger plugin is able to handle environment variables that are set by Hudson itself.  However, when trying to set them in the Execute Shell step, it didn&#8217;t pick up the newly set environment variables.  At this point I looked through the available options.  I saw that I could set the Parameterized Trigger to read from a parameters file.  I tried writing out a parameters file from the Execute Shell section, and reading it in using the Parameterized Trigger plugin.  Success!</p>
<p>Here are the commands I used to write out the properties file:</p>
<pre>echo "COMMIT_ID=$(git rev-parse HEAD)" &gt; params.properties
echo "COMMIT_TIME=$(git log -1 --pretty=\"format:%ad\")" &gt;&gt; params.properties</pre>
<p>In the end, it worked out pretty well.  After these commands are run, a params.properties file is created.  The Parameterized Trigger plugin is setup to read params.properties, and the information moves on to the next build.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.quibb.org/2010/04/passing-parameters-between-builds-in-hudson/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Nightly Benchmarks: Setting up Hudson</title>
		<link>http://blog.quibb.org/2010/04/nightly-benchmarks-setting-up-hudson/</link>
		<comments>http://blog.quibb.org/2010/04/nightly-benchmarks-setting-up-hudson/#comments</comments>
		<pubDate>Thu, 08 Apr 2010 13:36:27 +0000</pubDate>
		<dc:creator>Joe</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[benchmarks]]></category>
		<category><![CDATA[continuous integration]]></category>
		<category><![CDATA[jruby]]></category>

		<guid isPermaLink="false">http://blog.quibb.org/?p=169</guid>
		<description><![CDATA[For some projects, finding out about performance regressions is important.  I&#8217;m going to write a two part series about setting up a nightly build machine and displaying the generated data.  This part is going to cover installation of Hudson, and getting the benchmarks running nightly. I decided to give Hudson a try because I had [...]]]></description>
			<content:encoded><![CDATA[<p>For some projects, finding out about performance regressions is important.  I&#8217;m going to write a two part series about setting up a nightly build machine and displaying the generated data.  This part is going to cover installation of <a title="Hudson" href="http://hudson-ci.org/">Hudson</a>, and getting the benchmarks running nightly.</p>
<p>I decided to give Hudson a try because I had heard good things about it.  Also after hearing coworkers complain about cruise control and cdash, I thought I&#8217;d try something new.  Since Hudson has pretty extensive documentation, I&#8217;ll walk you through setting up the JRuby project to build with Hudson and getting benchmarks running on it.</p>
<h2>Hudson Installation</h2>
<p>On Ubuntu it&#8217;s as simple as:</p>
<pre>sudo apt-get install hudson
</pre>
<p>While I didn&#8217;t install it on windows, the installation should require little more than installing <a title="Tomcat" href="http://tomcat.apache.org/">Tomcat</a> and then downloading the Hudson war file and put it in the web-apps directory.</p>
<p>After installation browsing to http://127.0.0.1:8080 should show the Hudson Dashboard.</p>
<h3>Hudson Configuration</h3>
<p>After Hudson installation is complete, it requires very little configuration before setting up your first project.  One thing that may be necessary is going to the plugins page and making sure your version control system is covered.  For setting up a continuous integration machine to build JRuby, the git plugin is necessary.</p>
<p>To install the Hudson Git Plugin, click <strong>Manage Hudson</strong> on the left hand side.  Then click <strong>Manage Plugins</strong> from the list in the middle of the screen.  Click the <strong>Available</strong> tab, and find the <span style="text-decoration: underline;">Hudson GIT plugin</span> in the list.  After it&#8217;s installed it will show up in the Installed tab.</p>
<p>After installing all the necessary plugins for your project go back to the Hudson Dashboard by clicking the Hudson logo, or the Back to Dashboard link.</p>
<h2>Setting up a Project to Build</h2>
<p>A good first step it to make sure the project will build on the given machine without being built through Hudson.  There may be some dependencies that got overlooked, and this is a good way to make sure everything is setup to build your project.</p>
<p>Now, click on the <strong>New Job</strong> link on the left hand side.  For the JRuby project, the <strong>Build a free-style software project</strong> is the type of project to setup.  I imagine that is the correct type of project to setup for most projects.</p>
<p>Unless you plan on keeping all the builds produced on the server, the <strong>Discard Old Builds</strong> is a good option to check, and set how long you want the builds to remain on the server.  Choose the source code management tool that you use for your project, which is Git for JRuby, and set the appropriate settings.</p>
<p>JRuby settings:</p>
<pre>URL of Repository: git://github.com/jruby/jruby.git
Branch Specifier (blank for default): master
Repository browser (Auto)
</pre>
<p>There are several types of Build Triggers by default.  More Build Triggers can be added through plugins, if you&#8217;re looking for another way to trigger a build.  For a nightly build at midnight select the <strong>Build periodically</strong> option, and put <em>@midnight</em> in the field.</p>
<p>For the build step, if you&#8217;re building a Java project select <strong>Invoke Ant</strong>.  Otherwise, <strong>Execute shell</strong> may be a good option for you.  For JRuby, select Invoke Ant and set the target to jar to build it.</p>
<p>At this point you can click the <strong>Save</strong> button at the bottom of the page and click <strong>Build Now</strong> on the next page to build your project.  It&#8217;s a good idea to make sure your project builds correctly before trying to add in nightly benchmarks.  It&#8217;s easier to debug problems before you have too much going on.  By clicking on the build from the active builds list the console output can be seen from the browser.</p>
<h2>Running the Benchmarks</h2>
<p>If your benchmarks are in the same repository, you&#8217;re mostly done.  Add another build step, and set it up to run your benchmarks.  While JRuby does have benchmarks in its repository, the benchmarks I plan on running are in a different repository.  With this goal in mind, I created another Job in Hudson to checkout and run the benchmarks.</p>
<p>Its setup is very similar to that of JRuby, it checks out the source and runs the benchmarks.  The main difference is that a parameter is passed to the project to tell it which Ruby VM to use.  The <span style="text-decoration: underline;">Parameterized Trigger Plugin</span> is necessary to pass a parameter from one project to another.  The way it works is you set a parameter in the project receiving the parameter near the top of the page.  In my case, I added a RUBY_PATH parameter.  Then you setup the build job to send that parameter to the benchmarks job.</p>
<p>To do this, I went back to the JRuby job and turned on the <strong>Trigger parameterized build on other projects</strong> option.  It should be the last option down at the bottom of the page.  I set the JRuby job to trigger with the benchmarks job name, and in the predefined parameters field I put the following:</p>
<pre> RUBY_PATH=$WORKSPACE/bin/jruby</pre>
<p>After this is in place, when a JRuby build finishes it will start a benchmarks run.  Now that your benchmarks are up and running, the next part to this series will go over how to display the information in a way that makes it easy to spot regressions.</p>
<p>If you have any questions or if I went over something too quickly, post a comment and/or ask a question.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.quibb.org/2010/04/nightly-benchmarks-setting-up-hudson/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>JSR-166: The Java fork/join Framework</title>
		<link>http://blog.quibb.org/2010/03/jsr-166-the-java-forkjoin-framework/</link>
		<comments>http://blog.quibb.org/2010/03/jsr-166-the-java-forkjoin-framework/#comments</comments>
		<pubDate>Wed, 10 Mar 2010 02:53:22 +0000</pubDate>
		<dc:creator>Joe</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[sorting]]></category>
		<category><![CDATA[threading]]></category>

		<guid isPermaLink="false">http://blog.quibb.org/?p=153</guid>
		<description><![CDATA[The JSR-166 are concurrent utilities that were included in Java 5.  The fork/join framework was a piece of it that didn&#8217;t make it into Java 5.  After all this time the fork/join framework is finally making it into JDK 7.  What surprised me about the framework is that it is so easy to use. The [...]]]></description>
			<content:encoded><![CDATA[<p>The <a title="JSR-166" href="http://jcp.org/en/jsr/detail?id=166">JSR-166</a> are concurrent utilities that were  included in Java 5.  The fork/join framework was a piece of it that  didn&#8217;t make it into Java 5.  After all this time the fork/join framework  is finally making it into JDK 7.  What surprised me about the framework  is that it is so easy to use.</p>
<p>The fork/join framework is designed to make divide-and-conquer algorithms easy to parallelize.   More specifically, recursive algorithms where the control path branches  out over a few paths and they each process an equal part of the data  set.  The typical setup is a new class is created that extends either  the <a title="RecursiveAction" href="http://gee.cs.oswego.edu/dl/jsr166/dist/jsr166ydocs/jsr166y/RecursiveAction.html">RecursiveAction</a> or <a title="RecursiveTask" href="http://gee.cs.oswego.edu/dl/jsr166/dist/jsr166ydocs/jsr166y/RecursiveTask.html">RecursiveTask</a> class.  The parameters that were sent into the recursive function  become member variables in the newly defined class.  Then the recursive  calls are replaced by invokeAll(&#8230;) rather than the calls to the  function itself.</p>
<p>In writing this post, I kept going back for  forth on whether I should use Fibonacci numbers as an example or  something with more meat to it.  The computations done by each recursive  call of a Fibonacci numbers algorithm is too small to matter, not only  that, but there are much better non-parallel algorithms for Fibonacci numbers.  In the end, I decided on showing a merge sort.  It is used as the example in the fork/join documentation, but this will be a more complete example showing both the sequential algorithm and the changes made for the  parallel version of the algorithm.  You&#8217;ll see that it&#8217;s not that hard.</p>
<p>First  let me start by showing the source code for a typical MergeSort:</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">class</span> MergeSort <span style="color: #009900;">&#123;</span>
&nbsp;
    <span style="color: #000000; font-weight: bold;">private</span> <span style="color: #000000; font-weight: bold;">static</span> <span style="color: #000000; font-weight: bold;">final</span> <span style="color: #000066; font-weight: bold;">int</span> SIZE_THRESHOLD <span style="color: #339933;">=</span> <span style="color: #cc66cc;">16</span><span style="color: #339933;">;</span>
&nbsp;
    <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">static</span> <span style="color: #000066; font-weight: bold;">void</span> sort<span style="color: #009900;">&#40;</span><span style="color: #003399;">Comparable</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> a<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
        sort<span style="color: #009900;">&#40;</span>a, <span style="color: #cc66cc;">0</span>, a.<span style="color: #006633;">length</span><span style="color: #339933;">-</span><span style="color: #cc66cc;">1</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
&nbsp;
    <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">static</span> <span style="color: #000066; font-weight: bold;">void</span> sort<span style="color: #009900;">&#40;</span><span style="color: #003399;">Comparable</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> a, <span style="color: #000066; font-weight: bold;">int</span> lo, <span style="color: #000066; font-weight: bold;">int</span> hi<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
        <span style="color: #000000; font-weight: bold;">if</span> <span style="color: #009900;">&#40;</span>hi <span style="color: #339933;">-</span> lo <span style="color: #339933;">&lt;</span> SIZE_THRESHOLD<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
            insertionsort<span style="color: #009900;">&#40;</span>a, lo, hi<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
            <span style="color: #000000; font-weight: bold;">return</span><span style="color: #339933;">;</span>
        <span style="color: #009900;">&#125;</span>
&nbsp;
        <span style="color: #003399;">Comparable</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> tmp <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> <span style="color: #003399;">Comparable</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#40;</span>hi <span style="color: #339933;">-</span> lo<span style="color: #009900;">&#41;</span> <span style="color: #339933;">/</span> <span style="color: #cc66cc;">2</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">+</span> <span style="color: #cc66cc;">1</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
        mergeSort<span style="color: #009900;">&#40;</span>a, tmp, lo, hi<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
&nbsp;
    <span style="color: #000000; font-weight: bold;">private</span> <span style="color: #000000; font-weight: bold;">static</span> <span style="color: #000066; font-weight: bold;">void</span> mergeSort<span style="color: #009900;">&#40;</span><span style="color: #003399;">Comparable</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> a, <span style="color: #003399;">Comparable</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> tmp, <span style="color: #000066; font-weight: bold;">int</span> lo, <span style="color: #000066; font-weight: bold;">int</span> hi<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
        <span style="color: #000000; font-weight: bold;">if</span> <span style="color: #009900;">&#40;</span>hi <span style="color: #339933;">-</span> lo <span style="color: #339933;">&lt;</span> SIZE_THRESHOLD<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
            insertionsort<span style="color: #009900;">&#40;</span>a, lo, hi<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
            <span style="color: #000000; font-weight: bold;">return</span><span style="color: #339933;">;</span>
        <span style="color: #009900;">&#125;</span>
&nbsp;
        <span style="color: #000066; font-weight: bold;">int</span> m <span style="color: #339933;">=</span> <span style="color: #009900;">&#40;</span>lo <span style="color: #339933;">+</span> hi<span style="color: #009900;">&#41;</span> <span style="color: #339933;">/</span> <span style="color: #cc66cc;">2</span><span style="color: #339933;">;</span>
        mergeSort<span style="color: #009900;">&#40;</span>a, tmp, lo, m<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        mergeSort<span style="color: #009900;">&#40;</span>a, tmp, m <span style="color: #339933;">+</span> <span style="color: #cc66cc;">1</span>, hi<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        merge<span style="color: #009900;">&#40;</span>a, tmp, lo, m, hi<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
&nbsp;
    <span style="color: #000000; font-weight: bold;">private</span> <span style="color: #000000; font-weight: bold;">static</span> <span style="color: #000066; font-weight: bold;">void</span> merge<span style="color: #009900;">&#40;</span><span style="color: #003399;">Comparable</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> a, <span style="color: #003399;">Comparable</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> b, <span style="color: #000066; font-weight: bold;">int</span> lo, <span style="color: #000066; font-weight: bold;">int</span> m, <span style="color: #000066; font-weight: bold;">int</span> hi<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
        <span style="color: #000000; font-weight: bold;">if</span> <span style="color: #009900;">&#40;</span>a<span style="color: #009900;">&#91;</span>m<span style="color: #009900;">&#93;</span>.<span style="color: #006633;">compareTo</span><span style="color: #009900;">&#40;</span>a<span style="color: #009900;">&#91;</span>m<span style="color: #339933;">+</span><span style="color: #cc66cc;">1</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">&lt;=</span> <span style="color: #cc66cc;">0</span><span style="color: #009900;">&#41;</span>
            <span style="color: #000000; font-weight: bold;">return</span><span style="color: #339933;">;</span>
&nbsp;
        <span style="color: #003399;">System</span>.<span style="color: #006633;">arraycopy</span><span style="color: #009900;">&#40;</span>a, lo, b, <span style="color: #cc66cc;">0</span>, m<span style="color: #339933;">-</span>lo<span style="color: #339933;">+</span><span style="color: #cc66cc;">1</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
        <span style="color: #000066; font-weight: bold;">int</span> i <span style="color: #339933;">=</span> <span style="color: #cc66cc;">0</span><span style="color: #339933;">;</span>
        <span style="color: #000066; font-weight: bold;">int</span> j <span style="color: #339933;">=</span> m<span style="color: #339933;">+</span><span style="color: #cc66cc;">1</span><span style="color: #339933;">;</span>
        <span style="color: #000066; font-weight: bold;">int</span> k <span style="color: #339933;">=</span> lo<span style="color: #339933;">;</span>
&nbsp;
        <span style="color: #666666; font-style: italic;">// copy back next-greatest element at each time</span>
        <span style="color: #000000; font-weight: bold;">while</span> <span style="color: #009900;">&#40;</span>k <span style="color: #339933;">&lt;</span> j <span style="color: #339933;">&amp;&amp;</span> j <span style="color: #339933;">&lt;=</span> hi<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
            <span style="color: #000000; font-weight: bold;">if</span> <span style="color: #009900;">&#40;</span>b<span style="color: #009900;">&#91;</span>i<span style="color: #009900;">&#93;</span>.<span style="color: #006633;">compareTo</span><span style="color: #009900;">&#40;</span>a<span style="color: #009900;">&#91;</span>j<span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">&lt;=</span> <span style="color: #cc66cc;">0</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
                a<span style="color: #009900;">&#91;</span>k<span style="color: #339933;">++</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> b<span style="color: #009900;">&#91;</span>i<span style="color: #339933;">++</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
            <span style="color: #009900;">&#125;</span> <span style="color: #000000; font-weight: bold;">else</span> <span style="color: #009900;">&#123;</span>
                a<span style="color: #009900;">&#91;</span>k<span style="color: #339933;">++</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> a<span style="color: #009900;">&#91;</span>j<span style="color: #339933;">++</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
            <span style="color: #009900;">&#125;</span>
        <span style="color: #009900;">&#125;</span>
&nbsp;
        <span style="color: #666666; font-style: italic;">// copy back remaining elements of first half (if any)</span>
        <span style="color: #003399;">System</span>.<span style="color: #006633;">arraycopy</span><span style="color: #009900;">&#40;</span>b, i, a, k, j<span style="color: #339933;">-</span>k<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
    <span style="color: #009900;">&#125;</span>
&nbsp;
    <span style="color: #000000; font-weight: bold;">private</span> <span style="color: #000000; font-weight: bold;">static</span> <span style="color: #000066; font-weight: bold;">void</span> insertionsort<span style="color: #009900;">&#40;</span><span style="color: #003399;">Comparable</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> a, <span style="color: #000066; font-weight: bold;">int</span> lo, <span style="color: #000066; font-weight: bold;">int</span> hi<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
        <span style="color: #000000; font-weight: bold;">for</span> <span style="color: #009900;">&#40;</span><span style="color: #000066; font-weight: bold;">int</span> i <span style="color: #339933;">=</span> lo<span style="color: #339933;">+</span><span style="color: #cc66cc;">1</span><span style="color: #339933;">;</span> i <span style="color: #339933;">&lt;=</span> hi<span style="color: #339933;">;</span> i<span style="color: #339933;">++</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
            <span style="color: #000066; font-weight: bold;">int</span> j <span style="color: #339933;">=</span> i<span style="color: #339933;">;</span>
            <span style="color: #003399;">Comparable</span> t <span style="color: #339933;">=</span> a<span style="color: #009900;">&#91;</span>j<span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
            <span style="color: #000000; font-weight: bold;">while</span> <span style="color: #009900;">&#40;</span>j <span style="color: #339933;">&gt;</span> lo <span style="color: #339933;">&amp;&amp;</span> t.<span style="color: #006633;">compareTo</span><span style="color: #009900;">&#40;</span>a<span style="color: #009900;">&#91;</span>j <span style="color: #339933;">-</span> <span style="color: #cc66cc;">1</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">&lt;</span> <span style="color: #cc66cc;">0</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
                a<span style="color: #009900;">&#91;</span>j<span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> a<span style="color: #009900;">&#91;</span>j <span style="color: #339933;">-</span> <span style="color: #cc66cc;">1</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
                <span style="color: #339933;">--</span>j<span style="color: #339933;">;</span>
            <span style="color: #009900;">&#125;</span>
            a<span style="color: #009900;">&#91;</span>j<span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> t<span style="color: #339933;">;</span>
        <span style="color: #009900;">&#125;</span>
    <span style="color: #009900;">&#125;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>Now here is the code for the parallel version of  MergeSort:</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">class</span> ParallelMergeSort <span style="color: #009900;">&#123;</span>
&nbsp;
    <span style="color: #000000; font-weight: bold;">private</span> <span style="color: #000000; font-weight: bold;">static</span> <span style="color: #000000; font-weight: bold;">final</span> ForkJoinPool threadPool <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> ForkJoinPool<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #000000; font-weight: bold;">private</span> <span style="color: #000000; font-weight: bold;">static</span> <span style="color: #000000; font-weight: bold;">final</span> <span style="color: #000066; font-weight: bold;">int</span> SIZE_THRESHOLD <span style="color: #339933;">=</span> <span style="color: #cc66cc;">16</span><span style="color: #339933;">;</span>
&nbsp;
    <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">static</span> <span style="color: #000066; font-weight: bold;">void</span> sort<span style="color: #009900;">&#40;</span><span style="color: #003399;">Comparable</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> a<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
        sort<span style="color: #009900;">&#40;</span>a, <span style="color: #cc66cc;">0</span>, a.<span style="color: #006633;">length</span><span style="color: #339933;">-</span><span style="color: #cc66cc;">1</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
&nbsp;
    <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">static</span> <span style="color: #000066; font-weight: bold;">void</span> sort<span style="color: #009900;">&#40;</span><span style="color: #003399;">Comparable</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> a, <span style="color: #000066; font-weight: bold;">int</span> lo, <span style="color: #000066; font-weight: bold;">int</span> hi<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
        <span style="color: #000000; font-weight: bold;">if</span> <span style="color: #009900;">&#40;</span>hi <span style="color: #339933;">-</span> lo <span style="color: #339933;">&lt;</span> SIZE_THRESHOLD<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
            insertionsort<span style="color: #009900;">&#40;</span>a, lo, hi<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
            <span style="color: #000000; font-weight: bold;">return</span><span style="color: #339933;">;</span>
        <span style="color: #009900;">&#125;</span>
&nbsp;
        <span style="color: #003399;">Comparable</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> tmp <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> <span style="color: #003399;">Comparable</span><span style="color: #009900;">&#91;</span>a.<span style="color: #006633;">length</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
        threadPool.<span style="color: #006633;">invoke</span><span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">new</span> SortTask<span style="color: #009900;">&#40;</span>a, tmp, lo, hi<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
&nbsp;
    <span style="color: #008000; font-style: italic; font-weight: bold;">/**
     * This class replaces the recursive function that was
     * previously here.
     */</span>
    <span style="color: #000000; font-weight: bold;">static</span> <span style="color: #000000; font-weight: bold;">class</span> SortTask <span style="color: #000000; font-weight: bold;">extends</span> RecursiveAction <span style="color: #009900;">&#123;</span>
        <span style="color: #003399;">Comparable</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> a<span style="color: #339933;">;</span>
        <span style="color: #003399;">Comparable</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> tmp<span style="color: #339933;">;</span>
        <span style="color: #000066; font-weight: bold;">int</span> lo, hi<span style="color: #339933;">;</span>
        <span style="color: #000000; font-weight: bold;">public</span> SortTask<span style="color: #009900;">&#40;</span><span style="color: #003399;">Comparable</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> a, <span style="color: #003399;">Comparable</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> tmp, <span style="color: #000066; font-weight: bold;">int</span> lo, <span style="color: #000066; font-weight: bold;">int</span> hi<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
            <span style="color: #000000; font-weight: bold;">this</span>.<span style="color: #006633;">a</span> <span style="color: #339933;">=</span> a<span style="color: #339933;">;</span>
            <span style="color: #000000; font-weight: bold;">this</span>.<span style="color: #006633;">lo</span> <span style="color: #339933;">=</span> lo<span style="color: #339933;">;</span>
            <span style="color: #000000; font-weight: bold;">this</span>.<span style="color: #006633;">hi</span> <span style="color: #339933;">=</span> hi<span style="color: #339933;">;</span>
            <span style="color: #000000; font-weight: bold;">this</span>.<span style="color: #006633;">tmp</span> <span style="color: #339933;">=</span> tmp<span style="color: #339933;">;</span>
        <span style="color: #009900;">&#125;</span>
&nbsp;
        @Override
        <span style="color: #000000; font-weight: bold;">protected</span> <span style="color: #000066; font-weight: bold;">void</span> compute<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
            <span style="color: #000000; font-weight: bold;">if</span> <span style="color: #009900;">&#40;</span>hi <span style="color: #339933;">-</span> lo <span style="color: #339933;">&lt;</span> SIZE_THRESHOLD<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
                insertionsort<span style="color: #009900;">&#40;</span>a, lo, hi<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
                <span style="color: #000000; font-weight: bold;">return</span><span style="color: #339933;">;</span>
            <span style="color: #009900;">&#125;</span>
&nbsp;
            <span style="color: #000066; font-weight: bold;">int</span> m <span style="color: #339933;">=</span> <span style="color: #009900;">&#40;</span>lo <span style="color: #339933;">+</span> hi<span style="color: #009900;">&#41;</span> <span style="color: #339933;">/</span> <span style="color: #cc66cc;">2</span><span style="color: #339933;">;</span>
            <span style="color: #666666; font-style: italic;">// the two recursive calls are replaced by a call to invokeAll</span>
            invokeAll<span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">new</span> SortTask<span style="color: #009900;">&#40;</span>a, tmp, lo, m<span style="color: #009900;">&#41;</span>, <span style="color: #000000; font-weight: bold;">new</span> SortTask<span style="color: #009900;">&#40;</span>a, tmp, m<span style="color: #339933;">+</span><span style="color: #cc66cc;">1</span>, hi<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
            merge<span style="color: #009900;">&#40;</span>a, tmp, lo, m, hi<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        <span style="color: #009900;">&#125;</span>
    <span style="color: #009900;">&#125;</span>
&nbsp;
    <span style="color: #000000; font-weight: bold;">private</span> <span style="color: #000000; font-weight: bold;">static</span> <span style="color: #000066; font-weight: bold;">void</span> merge<span style="color: #009900;">&#40;</span><span style="color: #003399;">Comparable</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> a, <span style="color: #003399;">Comparable</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> b, <span style="color: #000066; font-weight: bold;">int</span> lo, <span style="color: #000066; font-weight: bold;">int</span> m, <span style="color: #000066; font-weight: bold;">int</span> hi<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
        <span style="color: #000000; font-weight: bold;">if</span> <span style="color: #009900;">&#40;</span>a<span style="color: #009900;">&#91;</span>m<span style="color: #009900;">&#93;</span>.<span style="color: #006633;">compareTo</span><span style="color: #009900;">&#40;</span>a<span style="color: #009900;">&#91;</span>m<span style="color: #339933;">+</span><span style="color: #cc66cc;">1</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">&lt;=</span> <span style="color: #cc66cc;">0</span><span style="color: #009900;">&#41;</span>
            <span style="color: #000000; font-weight: bold;">return</span><span style="color: #339933;">;</span>
&nbsp;
        <span style="color: #003399;">System</span>.<span style="color: #006633;">arraycopy</span><span style="color: #009900;">&#40;</span>a, lo, b, lo, m<span style="color: #339933;">-</span>lo<span style="color: #339933;">+</span><span style="color: #cc66cc;">1</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
        <span style="color: #000066; font-weight: bold;">int</span> i <span style="color: #339933;">=</span> lo<span style="color: #339933;">;</span>
        <span style="color: #000066; font-weight: bold;">int</span> j <span style="color: #339933;">=</span> m<span style="color: #339933;">+</span><span style="color: #cc66cc;">1</span><span style="color: #339933;">;</span>
        <span style="color: #000066; font-weight: bold;">int</span> k <span style="color: #339933;">=</span> lo<span style="color: #339933;">;</span>
&nbsp;
        <span style="color: #666666; font-style: italic;">// copy back next-greatest element at each time</span>
        <span style="color: #000000; font-weight: bold;">while</span> <span style="color: #009900;">&#40;</span>k <span style="color: #339933;">&lt;</span> j <span style="color: #339933;">&amp;&amp;</span> j <span style="color: #339933;">&lt;=</span> hi<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
            <span style="color: #000000; font-weight: bold;">if</span> <span style="color: #009900;">&#40;</span>b<span style="color: #009900;">&#91;</span>i<span style="color: #009900;">&#93;</span>.<span style="color: #006633;">compareTo</span><span style="color: #009900;">&#40;</span>a<span style="color: #009900;">&#91;</span>j<span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">&lt;=</span> <span style="color: #cc66cc;">0</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
                a<span style="color: #009900;">&#91;</span>k<span style="color: #339933;">++</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> b<span style="color: #009900;">&#91;</span>i<span style="color: #339933;">++</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
            <span style="color: #009900;">&#125;</span> <span style="color: #000000; font-weight: bold;">else</span> <span style="color: #009900;">&#123;</span>
                a<span style="color: #009900;">&#91;</span>k<span style="color: #339933;">++</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> a<span style="color: #009900;">&#91;</span>j<span style="color: #339933;">++</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
            <span style="color: #009900;">&#125;</span>
        <span style="color: #009900;">&#125;</span>
&nbsp;
        <span style="color: #666666; font-style: italic;">// copy back remaining elements of first half (if any)</span>
        <span style="color: #003399;">System</span>.<span style="color: #006633;">arraycopy</span><span style="color: #009900;">&#40;</span>b, i, a, k, j<span style="color: #339933;">-</span>k<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
    <span style="color: #009900;">&#125;</span>
&nbsp;
    <span style="color: #000000; font-weight: bold;">private</span> <span style="color: #000000; font-weight: bold;">static</span> <span style="color: #000066; font-weight: bold;">void</span> insertionsort<span style="color: #009900;">&#40;</span><span style="color: #003399;">Comparable</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> a, <span style="color: #000066; font-weight: bold;">int</span> lo, <span style="color: #000066; font-weight: bold;">int</span> hi<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
        <span style="color: #000000; font-weight: bold;">for</span> <span style="color: #009900;">&#40;</span><span style="color: #000066; font-weight: bold;">int</span> i <span style="color: #339933;">=</span> lo<span style="color: #339933;">+</span><span style="color: #cc66cc;">1</span><span style="color: #339933;">;</span> i <span style="color: #339933;">&lt;=</span> hi<span style="color: #339933;">;</span> i<span style="color: #339933;">++</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
            <span style="color: #000066; font-weight: bold;">int</span> j <span style="color: #339933;">=</span> i<span style="color: #339933;">;</span>
            <span style="color: #003399;">Comparable</span> t <span style="color: #339933;">=</span> a<span style="color: #009900;">&#91;</span>j<span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
            <span style="color: #000000; font-weight: bold;">while</span> <span style="color: #009900;">&#40;</span>j <span style="color: #339933;">&gt;</span> lo <span style="color: #339933;">&amp;&amp;</span> t.<span style="color: #006633;">compareTo</span><span style="color: #009900;">&#40;</span>a<span style="color: #009900;">&#91;</span>j <span style="color: #339933;">-</span> <span style="color: #cc66cc;">1</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">&lt;</span> <span style="color: #cc66cc;">0</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
                a<span style="color: #009900;">&#91;</span>j<span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> a<span style="color: #009900;">&#91;</span>j <span style="color: #339933;">-</span> <span style="color: #cc66cc;">1</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
                <span style="color: #339933;">--</span>j<span style="color: #339933;">;</span>
            <span style="color: #009900;">&#125;</span>
            a<span style="color: #009900;">&#91;</span>j<span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> t<span style="color: #339933;">;</span>
        <span style="color: #009900;">&#125;</span>
    <span style="color: #009900;">&#125;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>As you can see the majority of the  algorithm has remained intact.  As stated above a new class is created  that extends RecursiveAction, and the parameters of the function are  then passed into that class during creation.  One thing to take note, is that previously only half the size of the original array was created as  secondary storage.  Now the entire length of the array is created as a  temporary storage.  This is used to avoid different threads needing the  same area of the array at the same time.</p>
<p>Changes to the algorithm may  be needed, but it definitely helps in making it easier to move to parallel processing.  One other thing to note is the presence of the  <a title="ForkJoinPool" href="http://gee.cs.oswego.edu/dl/jsr166/dist/jsr166ydocs/jsr166y/ForkJoinPool.html">ForkJoinPool</a>.  The default constructor looks at the processor and determines the appropriate level of parallelism for the task.</p>
<p>I have a quad core CPU, so the ForkJoinPool will spawn at least four threads if necessary.  That said, I&#8217;ve seen in where only two threads are spawned because more than that was not  necessary for the given task.  The ForkJoinPool spawns more threads as deemed necessary without starting right at the maximum.</p>
<p>A complete API for the fork/join  framework can be found here at the Concurrency <a title="JSR-166 Interest Site" href="http://gee.cs.oswego.edu/dl/concurrency-interest/">JSR-166 Interest Site</a>.  All that is needed  for Java 6 is the jsr166y package.</p>
<p>Some other algorithms that  are suited for parallelism that I&#8217;ve been thinking about are graph  searching algorithms such as depth first and breadth first search.   Depending on whether they are done on a tree or a graph determines how  much the underlying data structure will need to be changed to support  the parallelism.  I plan to look at making a parallel version of the  quicksort algorithm using this framework.  Most divide and conquer  algorithms can be adapted fairly easily to be multi-threaded using this  method, but remember for a performance benefit to be seen the task must  be sufficiently large.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.quibb.org/2010/03/jsr-166-the-java-forkjoin-framework/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>
