<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Blog::Quibb &#187; benchmarks</title>
	<atom:link href="http://blog.quibb.org/tag/benchmarks/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.quibb.org</link>
	<description>Software development and more.</description>
	<lastBuildDate>Tue, 10 Aug 2010 14:11:56 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Fast Bulk Inserts into SQLite</title>
		<link>http://blog.quibb.org/2010/08/fast-bulk-inserts-into-sqlite/</link>
		<comments>http://blog.quibb.org/2010/08/fast-bulk-inserts-into-sqlite/#comments</comments>
		<pubDate>Tue, 10 Aug 2010 14:11:56 +0000</pubDate>
		<dc:creator>Joe</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[benchmarks]]></category>
		<category><![CDATA[c++]]></category>
		<category><![CDATA[optimization]]></category>
		<category><![CDATA[sqlite]]></category>

		<guid isPermaLink="false">http://blog.quibb.org/?p=219</guid>
		<description><![CDATA[Background Sometimes it’s necessary to get information into a database quickly. SQLite is a light weight database engine that can be easily embedded in applications. This will cover the process of optimizing bulk inserts into an SQLite database. While this article focuses on SQLite some of the techniques shown here will apply to other databases. [...]]]></description>
			<content:encoded><![CDATA[<h2>Background</h2>
<p>Sometimes it’s necessary to get information into a database quickly. <a title="SQLite" href="http://sqlite.org/"> SQLite</a> is a light weight database engine that can be easily embedded in applications.  This will cover the process of optimizing bulk inserts into an SQLite database.  While this article focuses on SQLite some of the techniques shown here will apply to other databases.</p>
<p>All of the following examples insert data into the same table.  It&#8217;s a table where an ID is the first element followed by three FLOAT values, and then follow by three INTEGER values.  You&#8217;ll notice the getDouble() and getInt() functions.  They return doubles and ints in a predictable manner.  I didn&#8217;t use random data because different values could potentially add variability to the benchmarks at the end.</p>
<h2>Naive Inserts</h2>
<p>This is the most basic way to insert information into SQLite.  It simply calls <a title="sqlite3_exec" href="http://www.sqlite.org/c3ref/exec.html">sqlite3_exec</a> for each insert in the database.</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">char</span> buffer<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">300</span><span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
<span style="color: #0000ff;">for</span> <span style="color: #008000;">&#40;</span><span style="color: #0000ff;">unsigned</span> i <span style="color: #000080;">=</span> <span style="color: #0000dd;">0</span><span style="color: #008080;">;</span> i <span style="color: #000080;">&lt;</span> mVal<span style="color: #008080;">;</span> i<span style="color: #000040;">++</span><span style="color: #008000;">&#41;</span>
<span style="color: #008000;">&#123;</span>
    <span style="color: #0000dd;">sprintf</span><span style="color: #008000;">&#40;</span>buffer, <span style="color: #FF0000;">&quot;INSERT INTO example VALUES ('%s', %lf, %lf, %lf, %d, %d, %d)&quot;</span>,
            getID<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>.<span style="color: #007788;">c_str</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>, getDouble<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>, getDouble<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>, getDouble<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>,
            getInt<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>, getInt<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>, getInt<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    sqlite3_exec<span style="color: #008000;">&#40;</span>mDb, buffer, <span style="color: #0000ff;">NULL</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #0000ff;">NULL</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span></pre></div></div>

<h2>Inserts within a Transaction</h2>
<p>A transaction is a way to group SQL statements together.  If an error is encountered the ON CONFLICT statement can be used to handle that to your liking.  Nothing will be written to the SQLite database until either END or COMMIT is encountered to signify the transaction should be closed and written.</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">char</span><span style="color: #000040;">*</span> errorMessage<span style="color: #008080;">;</span>
sqlite3_exec<span style="color: #008000;">&#40;</span>mDb, <span style="color: #FF0000;">&quot;BEGIN TRANSACTION&quot;</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #000040;">&amp;</span>errorMessage<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
<span style="color: #0000ff;">char</span> buffer<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">300</span><span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
<span style="color: #0000ff;">for</span> <span style="color: #008000;">&#40;</span><span style="color: #0000ff;">unsigned</span> i <span style="color: #000080;">=</span> <span style="color: #0000dd;">0</span><span style="color: #008080;">;</span> i <span style="color: #000080;">&lt;</span> mVal<span style="color: #008080;">;</span> i<span style="color: #000040;">++</span><span style="color: #008000;">&#41;</span>
<span style="color: #008000;">&#123;</span>
    <span style="color: #0000dd;">sprintf</span><span style="color: #008000;">&#40;</span>buffer, <span style="color: #FF0000;">&quot;INSERT INTO example VALUES ('%s', %lf, %lf, %lf, %d, %d, %d)&quot;</span>,
            getID<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>.<span style="color: #007788;">c_str</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>, getDouble<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>, getDouble<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>, getDouble<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>,
            getInt<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>, getInt<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>, getInt<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    sqlite3_exec<span style="color: #008000;">&#40;</span>mDb, buffer, <span style="color: #0000ff;">NULL</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #0000ff;">NULL</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span>
&nbsp;
sqlite3_exec<span style="color: #008000;">&#40;</span>mDb, <span style="color: #FF0000;">&quot;COMMIT TRANSACTION&quot;</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #000040;">&amp;</span>errorMessage<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span></pre></div></div>

<h2>PRAGMA Statements</h2>
<p><a title="PRAGMA" href="http://sqlite.org/pragma.html">PRAGMA</a> statements control the behavior of SQLite as a whole.  They can be used to tweak options such as how often the data is flushed to disk of the size of the cache.  These are some that are commonly used for performance.  The SQLite documentation fully explains what they do and the implications of using them.  For example, synchronous off will cause SQLite to not stop and wait for the data to get written to the hard drive.  In the event of a crash or power failure, it is more likely the database could be corrupted.</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;">sqlite3_exec<span style="color: #008000;">&#40;</span>mDb, <span style="color: #FF0000;">&quot;PRAGMA synchronous=OFF&quot;</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #000040;">&amp;</span>errorMessage<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
sqlite3_exec<span style="color: #008000;">&#40;</span>mDb, <span style="color: #FF0000;">&quot;PRAGMA count_changes=OFF&quot;</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #000040;">&amp;</span>errorMessage<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
sqlite3_exec<span style="color: #008000;">&#40;</span>mDb, <span style="color: #FF0000;">&quot;PRAGMA journal_mode=MEMORY&quot;</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #000040;">&amp;</span>errorMessage<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
sqlite3_exec<span style="color: #008000;">&#40;</span>mDb, <span style="color: #FF0000;">&quot;PRAGMA temp_store=MEMORY&quot;</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #000040;">&amp;</span>errorMessage<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span></pre></div></div>

<h2>Prepared Statements</h2>
<p><a title="Prepared Statements" href="http://sqlite.org/c3ref/prepare.html">Prepared statements</a> are the recommended way of sending queries to SQLite.   Rather than parsing the statement over and over again, the parser only needs to be run once on the statement.  According to the documentation, sqlite3_exec is a convenience function  that calls sqlite3_prepare_v2(), sqlite3_step(), and then  sqlite3_finalize().  In my opinion, the documentation should more explicitly say that prepared statements are the preferred query method.  sqlite3_exec() should only be used for one time use queries.</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">char</span><span style="color: #000040;">*</span> errorMessage<span style="color: #008080;">;</span>
sqlite3_exec<span style="color: #008000;">&#40;</span>mDb, <span style="color: #FF0000;">&quot;BEGIN TRANSACTION&quot;</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #000040;">&amp;</span>errorMessage<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
<span style="color: #0000ff;">char</span> buffer<span style="color: #008000;">&#91;</span><span style="color: #008000;">&#93;</span> <span style="color: #000080;">=</span> <span style="color: #FF0000;">&quot;INSERT INTO example VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7)&quot;</span><span style="color: #008080;">;</span>
sqlite3_stmt<span style="color: #000040;">*</span> stmt<span style="color: #008080;">;</span>
sqlite3_prepare_v2<span style="color: #008000;">&#40;</span>mDb, buffer, <span style="color: #0000dd;">strlen</span><span style="color: #008000;">&#40;</span>buffer<span style="color: #008000;">&#41;</span>, <span style="color: #000040;">&amp;</span>stmt, <span style="color: #0000ff;">NULL</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
<span style="color: #0000ff;">for</span> <span style="color: #008000;">&#40;</span><span style="color: #0000ff;">unsigned</span> i <span style="color: #000080;">=</span> <span style="color: #0000dd;">0</span><span style="color: #008080;">;</span> i <span style="color: #000080;">&lt;</span> mVal<span style="color: #008080;">;</span> i<span style="color: #000040;">++</span><span style="color: #008000;">&#41;</span>
<span style="color: #008000;">&#123;</span>
    std<span style="color: #008080;">::</span><span style="color: #007788;">string</span> id <span style="color: #000080;">=</span> getID<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    sqlite3_bind_text<span style="color: #008000;">&#40;</span>stmt, <span style="color: #0000dd;">1</span>, id.<span style="color: #007788;">c_str</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>, id.<span style="color: #007788;">size</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>, SQLITE_STATIC<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    sqlite3_bind_double<span style="color: #008000;">&#40;</span>stmt, <span style="color: #0000dd;">2</span>, getDouble<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    sqlite3_bind_double<span style="color: #008000;">&#40;</span>stmt, <span style="color: #0000dd;">3</span>, getDouble<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    sqlite3_bind_double<span style="color: #008000;">&#40;</span>stmt, <span style="color: #0000dd;">4</span>, getDouble<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    sqlite3_bind_int<span style="color: #008000;">&#40;</span>stmt, <span style="color: #0000dd;">5</span>, getInt<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    sqlite3_bind_int<span style="color: #008000;">&#40;</span>stmt, <span style="color: #0000dd;">6</span>, getInt<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    sqlite3_bind_int<span style="color: #008000;">&#40;</span>stmt, <span style="color: #0000dd;">7</span>, getInt<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
    <span style="color: #0000ff;">if</span> <span style="color: #008000;">&#40;</span>sqlite3_step<span style="color: #008000;">&#40;</span>stmt<span style="color: #008000;">&#41;</span> <span style="color: #000040;">!</span><span style="color: #000080;">=</span> SQLITE_DONE<span style="color: #008000;">&#41;</span>
    <span style="color: #008000;">&#123;</span>
        <span style="color: #0000dd;">printf</span><span style="color: #008000;">&#40;</span><span style="color: #FF0000;">&quot;Commit Failed!<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    <span style="color: #008000;">&#125;</span>
&nbsp;
    sqlite3_reset<span style="color: #008000;">&#40;</span>stmt<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span>
&nbsp;
sqlite3_exec<span style="color: #008000;">&#40;</span>mDb, <span style="color: #FF0000;">&quot;COMMIT TRANSACTION&quot;</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #000040;">&amp;</span>errorMessage<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
sqlite3_finalize<span style="color: #008000;">&#40;</span>stmt<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span></pre></div></div>

<h2>Storing Data as Binary Blob</h2>
<p>Up until now, most of the optimizations have been pretty much the standard advice that you get when looking into bulk insert optimization.  If you’re not running queries on some of the data, it’s possible to convert it to binary and store it as a blob.  While it’s not advised to just throw everything into a blob and put it in the database, putting data that would be pulled and used together into a binary blob can make sense in some situations.</p>
<p>For example, if you have a point class (x, y, z) with REAL values, it might make sense to store them in a blob rather than three separate fields in row.  That’s only if you don’t need to make queries on the data though.  The benefit of this technique increases as more fields are converted into larger blobs.</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">char</span><span style="color: #000040;">*</span> errorMessage<span style="color: #008080;">;</span>
sqlite3_exec<span style="color: #008000;">&#40;</span>mDb, <span style="color: #FF0000;">&quot;BEGIN TRANSACTION&quot;</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #000040;">&amp;</span>errorMessage<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
<span style="color: #0000ff;">char</span> buffer<span style="color: #008000;">&#91;</span><span style="color: #008000;">&#93;</span> <span style="color: #000080;">=</span> <span style="color: #FF0000;">&quot;INSERT INTO example VALUES (?1, ?2, ?3, ?4, ?5)&quot;</span><span style="color: #008080;">;</span>
sqlite3_stmt<span style="color: #000040;">*</span> stmt<span style="color: #008080;">;</span>
sqlite3_prepare_v2<span style="color: #008000;">&#40;</span>mDb, buffer, <span style="color: #0000dd;">strlen</span><span style="color: #008000;">&#40;</span>buffer<span style="color: #008000;">&#41;</span>, <span style="color: #000040;">&amp;</span>stmt, <span style="color: #0000ff;">NULL</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
<span style="color: #0000ff;">for</span> <span style="color: #008000;">&#40;</span><span style="color: #0000ff;">unsigned</span> i <span style="color: #000080;">=</span> <span style="color: #0000dd;">0</span><span style="color: #008080;">;</span> i <span style="color: #000080;">&lt;</span> mVal<span style="color: #008080;">;</span> i<span style="color: #000040;">++</span><span style="color: #008000;">&#41;</span>
<span style="color: #008000;">&#123;</span>
    std<span style="color: #008080;">::</span><span style="color: #007788;">string</span> id <span style="color: #000080;">=</span> getID<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    sqlite3_bind_text<span style="color: #008000;">&#40;</span>stmt, <span style="color: #0000dd;">1</span>, id.<span style="color: #007788;">c_str</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>, id.<span style="color: #007788;">size</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>, SQLITE_STATIC<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
    <span style="color: #0000ff;">char</span> dblBuffer<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">24</span><span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
    <span style="color: #0000ff;">double</span> d<span style="color: #008000;">&#91;</span><span style="color: #008000;">&#93;</span> <span style="color: #000080;">=</span> <span style="color: #008000;">&#123;</span>getDouble<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>, getDouble<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>, getDouble<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span>
    <span style="color: #0000dd;">memcpy</span><span style="color: #008000;">&#40;</span>dblBuffer, <span style="color: #008000;">&#40;</span><span style="color: #0000ff;">char</span><span style="color: #000040;">*</span><span style="color: #008000;">&#41;</span><span style="color: #000040;">&amp;</span>d, <span style="color: #0000dd;">sizeof</span><span style="color: #008000;">&#40;</span>d<span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    sqlite3_bind_blob<span style="color: #008000;">&#40;</span>stmt, <span style="color: #0000dd;">2</span>, dblBuffer, <span style="color: #0000dd;">24</span>, SQLITE_STATIC<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    sqlite3_bind_int<span style="color: #008000;">&#40;</span>stmt, <span style="color: #0000dd;">3</span>, getInt<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    sqlite3_bind_int<span style="color: #008000;">&#40;</span>stmt, <span style="color: #0000dd;">4</span>, getInt<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    sqlite3_bind_int<span style="color: #008000;">&#40;</span>stmt, <span style="color: #0000dd;">5</span>, getInt<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
    <span style="color: #0000ff;">int</span> retVal <span style="color: #000080;">=</span> sqlite3_step<span style="color: #008000;">&#40;</span>stmt<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    <span style="color: #0000ff;">if</span> <span style="color: #008000;">&#40;</span>retVal <span style="color: #000040;">!</span><span style="color: #000080;">=</span> SQLITE_DONE<span style="color: #008000;">&#41;</span>
    <span style="color: #008000;">&#123;</span>
        <span style="color: #0000dd;">printf</span><span style="color: #008000;">&#40;</span><span style="color: #FF0000;">&quot;Commit Failed! %d<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span>, retVal<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    <span style="color: #008000;">&#125;</span>
&nbsp;
    sqlite3_reset<span style="color: #008000;">&#40;</span>stmt<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span>
&nbsp;
sqlite3_exec<span style="color: #008000;">&#40;</span>mDb, <span style="color: #FF0000;">&quot;COMMIT TRANSACTION&quot;</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #000040;">&amp;</span>errorMessage<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
sqlite3_finalize<span style="color: #008000;">&#40;</span>stmt<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span></pre></div></div>

<p>Note: I just used memcpy here, but this would have issues going between big and little endian systems.  If that’s necessary, it would be a good idea to serialize the data using a serialization library (ie &#8211; <a title="protocol buffers" href="http://code.google.com/apis/protocolbuffers/docs/overview.html">protocol buffers</a> or <a title="MsgPack" href="http://msgpack.org/">msgpack</a>).</p>
<h2>Performance</h2>
<p>I ran benchmarks to test the performance of each method of inserting data.  Take note that the x axis does not scale linearly, it most closely matches a logarithmic scale.  The inserts per second graph was obtained by taking the number of inserts and dividing it by the total runtime.</p>
<div id="attachment_238" class="wp-caption aligncenter" style="width: 310px"><a href="http://blog.quibb.org/wp-content/uploads/2010/07/bulk_insert_runtime.png"><img class="size-medium wp-image-238 " title="SQLite Bulk Insert Runtime" src="http://blog.quibb.org/wp-content/uploads/2010/07/bulk_insert_runtime-300x182.png" alt="SQLite Bulk Insert Runtime" width="300" height="182" /></a><p class="wp-caption-text">SQLite Build Insert Runtime in Seconds</p></div>
<p style="text-align: center;">
<div id="attachment_239" class="wp-caption aligncenter" style="width: 310px"><a href="http://blog.quibb.org/wp-content/uploads/2010/07/inserts_per_second.png"><img class="size-medium wp-image-239" title="Inserts Per Second" src="http://blog.quibb.org/wp-content/uploads/2010/07/inserts_per_second-300x182.png" alt="Inserts Per Second" width="300" height="182" /></a><p class="wp-caption-text">SQLite Inserts Per Second</p></div>
<p style="text-align: center;"><a href="http://blog.quibb.org/wp-content/uploads/2010/07/inserts_per_second.png"></a></p>
<p style="text-align: left;">After running the first benchmark, I wanted to show how storing data in binary can make a difference.  I ran it again, but instead of storing only three doubles, I stored 24 doubles.  I assumed order mattered, so for the benchmark that is not stored in a binary blob, I made a separate table with ID and order columns.  This way both versions captured the same information.</p>
<p style="text-align: left;">
<div id="attachment_242" class="wp-caption aligncenter" style="width: 310px"><a href="http://blog.quibb.org/wp-content/uploads/2010/07/big_insert_runtime.png"><img class="size-medium wp-image-242" title="Big Insert Runtime" src="http://blog.quibb.org/wp-content/uploads/2010/07/big_insert_runtime-300x182.png" alt="Big Insert Runtime" width="300" height="182" /></a><p class="wp-caption-text">Big Insert Runtime in Seconds</p></div>
<div id="attachment_244" class="wp-caption aligncenter" style="width: 310px"><a href="http://blog.quibb.org/wp-content/uploads/2010/07/big_insert_per_second.png"><img class="size-medium wp-image-244" title="Big Inserts Per Second" src="http://blog.quibb.org/wp-content/uploads/2010/07/big_insert_per_second-300x181.png" alt="Big Inserts Per Second" width="300" height="181" /></a><p class="wp-caption-text">Big Inserts Per Second</p></div>
<p>Good luck with your database inserts.</p>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow: hidden;">
<h1 id="internal-source-marker_0.4793873936321398"><span style="font-size: 24pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: bold; font-style: normal; text-decoration: none; vertical-align: baseline;">Fast Bulk Inserts into SQLite</span></h1>
<h2><span style="font-size: 18pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: bold; font-style: normal; text-decoration: none; vertical-align: baseline;">Background</span></h2>
<p><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none; vertical-align: baseline;">Sometimes it’s necessary to get information into a database quickly.  SQLite[</span><a href="http://sqlite.org/"><span style="font-size: 11pt; font-family: Arial; color: #000099; background-color: transparent; font-weight: normal; font-style: normal; vertical-align: baseline; text-decoration: underline;">http://sqlite.org/</span></a><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none; vertical-align: baseline;">]  is a light weight database engine that can be easily embedded in  applications.  This will cover the process of optimizing bulk inserts  into an SQLite database.  While this article focuses on SQLite some of  the techniques shown here will apply to other databases.</span></p>
<h2><span style="font-size: 18pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: bold; font-style: normal; text-decoration: none; vertical-align: baseline;">Naive Inserts</span></h2>
<p><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none; vertical-align: baseline;">This is the most basic way to insert information into SQLite.  It simply calls sqlite3_exec[</span><a href="http://www.sqlite.org/c3ref/exec.html"><span style="font-size: 11pt; font-family: Arial; color: #000099; background-color: transparent; font-weight: normal; font-style: normal; vertical-align: baseline; text-decoration: underline;">http://www.sqlite.org/c3ref/exec.html</span></a><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none; vertical-align: baseline;">] for each insert in the database.</span><br />
<span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none; vertical-align: baseline;">[insert code here]</span></p>
<p><span style="font-size: 18pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: bold; font-style: normal; text-decoration: none; vertical-align: baseline;">Inserts within a Transaction</span><br />
<span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none; vertical-align: baseline;">A  transaction is a way to group SQL statements together.  If an error is  encountered the ON CONFLICT statement can be used to handle that to your  liking.  Nothing will be written to the SQLite database until either  END or COMMIT is encountered to signify the transaction should be  written and closed.</span><br />
<span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none; vertical-align: baseline;">[insert code here]</span></p>
<h2><span style="font-size: 18pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: bold; font-style: normal; text-decoration: none; vertical-align: baseline;">PRAGMA Statements</span></h2>
<p><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none; vertical-align: baseline;">PRAGMA statements[</span><a href="http://sqlite.org/pragma.html"><span style="font-size: 11pt; font-family: Arial; color: #000099; background-color: transparent; font-weight: normal; font-style: normal; vertical-align: baseline; text-decoration: underline;">http://sqlite.org/pragma.html</span></a><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none; vertical-align: baseline;">]  control the behavior of SQLite as a whole.  They can be used to tweak  options such as how often the data is flushed to disk of the size of the  cache.</span><br />
<span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none; vertical-align: baseline;">[insert code here]</span></p>
<h2><span style="font-size: 18pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: bold; font-style: normal; text-decoration: none; vertical-align: baseline;">Prepared Statements</span></h2>
<p><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none; vertical-align: baseline;">Prepared statements[</span><a href="http://sqlite.org/c3ref/prepare.html"><span style="font-size: 11pt; font-family: Arial; color: #000099; background-color: transparent; font-weight: normal; font-style: normal; vertical-align: baseline; text-decoration: underline;">http://sqlite.org/c3ref/prepare.html</span></a><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none; vertical-align: baseline;">]  are the recommended way of sending queries to SQLite.  Rather than  parsing the statement over and over again, the parser only needs to be  run once on the statement.  In all honesty, the documentation for  sqlite3_exec should say not to use it at all in favor of prepared  statements.  They are not only faster on inserts, but across the board  for all SQL statements.</span><br />
<span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none; vertical-align: baseline;">[insert cod here]</span></p>
<h2><span style="font-size: 18pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: bold; font-style: normal; text-decoration: none; vertical-align: baseline;">Storing Data as Binary Blob</span></h2>
<p><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none; vertical-align: baseline;">Up  until now, most of the optimizations have been pretty much the standard  advice that you get when looking into bulk insert optimization.  If  you’re not running queries on some of the data, it’s possible to convert  it to binary and store it as a blob.  While it’s not advised to just  throw everything into a blob and put it in the database, putting data  that would be pulled and used together into a binary blob can make sense  in some situations.</span><br />
<span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none; vertical-align: baseline;">For  example, if you have a point class (x, y, z) with REAL values, it might  make sense to store them in a blob rather than three separate fields in  row.  That’s only if you don’t need to make queries on the data though.   The benefits of this technique increase as more fields are converted  into larger blobs.</span><br />
<span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none; vertical-align: baseline;">[insert code here]</span><br />
<span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none; vertical-align: baseline;">Note:  I just do a memcpy here, but this would have issues going between big  and little endian systems.  If that’s necessary, it would be a good idea  to serialize the data using a serialization library (ie &#8211; protocol  buffers[http://code.google.com/apis/protocolbuffers/docs/overview.html],  msgpack[http://msgpack.org/],  thrift[http://incubator.apache.org/thrift/]).</span></p>
<h2><span style="font-size: 18pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: bold; font-style: normal; text-decoration: none; vertical-align: baseline;">Performance</span></h2>
<p>Fast Bulk Inserts into SQLite</p>
<p>Background</p>
<p>Sometimes it’s necessary to get information into a database quickly.  SQLite[http://sqlite.org/] is a light weight database engine that can be easily embedded in applications.  This will cover the process of optimizing bulk inserts into an SQLite database.  While this article focuses on SQLite some of the techniques shown here will apply to other databases.</p>
<p>Naive Inserts</p>
<p>This is the most basic way to insert information into SQLite.  It simply calls sqlite3_exec[http://www.sqlite.org/c3ref/exec.html] for each insert in the database.</p>
<p>[insert code here]</p>
<p>Inserts within a Transaction</p>
<p>A transaction is a way to group SQL statements together.  If an error is encountered the ON CONFLICT statement can be used to handle that to your liking.  Nothing will be written to the SQLite database until either END or COMMIT is encountered to signify the transaction should be written and closed.</p>
<p>[insert code here]</p>
<p>PRAGMA Statements</p>
<p>PRAGMA statements[http://sqlite.org/pragma.html] control the behavior of SQLite as a whole.  They can be used to tweak options such as how often the data is flushed to disk of the size of the cache.</p>
<p>[insert code here]</p>
<p>Prepared Statements</p>
<p>Prepared statements[http://sqlite.org/c3ref/prepare.html] are the recommended way of sending queries to SQLite.  Rather than parsing the statement over and over again, the parser only needs to be run once on the statement.  In all honesty, the documentation for sqlite3_exec should say not to use it at all in favor of prepared statements.  They are not only faster on inserts, but across the board for all SQL statements.</p>
<p>[insert cod here]</p>
<p>Storing Data as Binary Blob</p>
<p>Up until now, most of the optimizations have been pretty much the standard advice that you get when looking into bulk insert optimization.  If you’re not running queries on some of the data, it’s possible to convert it to binary and store it as a blob.  While it’s not advised to just throw everything into a blob and put it in the database, putting data that would be pulled and used together into a binary blob can make sense in some situations.</p>
<p>For example, if you have a point class (x, y, z) with REAL values, it might make sense to store them in a blob rather than three separate fields in row.  That’s only if you don’t need to make queries on the data though.  The benefits of this technique increase as more fields are converted into larger blobs.</p>
<p>[insert code here]</p>
<p>Note: I just do a memcpy here, but this would have issues going between big and little endian systems.  If that’s necessary, it would be a good idea to serialize the data using a serialization library (ie &#8211; protocol buffers[http://code.google.com/apis/protocolbuffers/docs/overview.html], msgpack[http://msgpack.org/], thrift[http://incubator.apache.org/thrift/]).</p>
<p>Performance</p>
</div>
<p><span id="more-219"></span></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.quibb.org/2010/08/fast-bulk-inserts-into-sqlite/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Nightly Benchmarks: Tracking Results with Codespeed</title>
		<link>http://blog.quibb.org/2010/07/nightly-benchmarks-tracking-results-with-codespeed/</link>
		<comments>http://blog.quibb.org/2010/07/nightly-benchmarks-tracking-results-with-codespeed/#comments</comments>
		<pubDate>Mon, 19 Jul 2010 14:16:54 +0000</pubDate>
		<dc:creator>Joe</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[benchmarks]]></category>
		<category><![CDATA[continuous integration]]></category>
		<category><![CDATA[jruby]]></category>

		<guid isPermaLink="false">http://blog.quibb.org/?p=204</guid>
		<description><![CDATA[Background Codespeed is a project for tracking performance. I discovered it when the PyPy project started using Codespeed to track performance. Since then development has been done to make its setup easier and provide more display options. Anyway, two posts ago I talked about running nightly benchmarks with Hudson. Then in the previous post I [...]]]></description>
			<content:encoded><![CDATA[<h2>Background</h2>
<p><a title="Codespeed" href="http://github.com/tobami/codespeed">Codespeed</a> is a project for tracking performance.  I discovered it when the <a title="PyPy" href="http://pypy.org/">PyPy</a> project started using Codespeed to track performance.  Since then development has been done to make its setup easier and provide more display options.</p>
<p>Anyway, two posts ago I talked about <a title="running nightly benchmarks with Hudson" href="http://blog.quibb.org/2010/04/nightly-benchmarks-setting-up-hudson/">running nightly benchmarks with Hudson</a>.  Then in the previous post I discussed <a title="passing parameters between builds in Hudson" href="http://blog.quibb.org/2010/04/passing-parameters-between-builds-in-hudson/">passing parameters between builds in Hudson</a>.  Both of these posts are worth reading before trying to setup Hudson with Codespeed.</p>
<h2>Codespeed Installation/Configuration</h2>
<h3>Django Quickstart</h3>
<p>Codespeed is built on Python and <a title="Django" href="http://www.djangoproject.com/">Django</a>.  Some basic knowledge of Django is needed in order to get everything up and running.  Don&#8217;t worry, it&#8217;s not that hard to learn the bit that is needed.  <a title="manage.py" href="http://docs.djangoproject.com/en/1.2/ref/django-admin/#ref-django-admin">manage.py</a> is all you need to know about to setup and view Codespeed.  There is information about <a title="deploying Django to a real web server" href="http://docs.djangoproject.com/en/1.2/howto/deployment/#howto-deployment-index">deploying Django to a real web server</a>, but I won&#8217;t be covering that here.</p>
<p>Here are the commands to get Django running:</p>
<p><a title="syncdb" href="http://docs.djangoproject.com/en/1.2/ref/django-admin/#syncdb"><strong>syncdb</strong></a></p>
<p>syncdb is used to initialize the database with the necessary tables.  It will also setup an admin account.  With the sqlite3 database selected, it will create the database file when this command is run.</p>
<p>The command is:</p>
<pre>python manage.py syncdb</pre>
<p><a title="runserver" href="http://docs.djangoproject.com/en/1.2/ref/django-admin/#runserver-port-or-ipaddr-port"><strong>runserver</strong></a></p>
<p>The next command is the runserver command.  This runs the built-in django server.  In the documentation they state you&#8217;re not supposed to use it in a production environment, so make sure to deploy to a production environment if you plan to host it on the Internet or high traffic network.</p>
<p>The command is:</p>
<pre>python manage.py runserver 0.0.0.0:9000</pre>
<p>By default the server will run on 127.0.0.1:8000.  Setting the IP to 0.0.0.0 allows connections from any computer.  This works well if you&#8217;re on a local area network and want to set it up on a VM over SSH, but still be able to access the web interface from your computer.  The port is the port for the server to run on.  To view Codespeed, point your browser at 127.0.0.1:9000 or the IP of the machine it&#8217;s on with the colon 9000.</p>
<p>Django has many <a title="Django settings" href="http://docs.djangoproject.com/en/1.2/ref/settings/#ref-settings">settings</a> that may or may not need to be tweaked for your environment.  They can be set through the <a title="speedcenter/settings.py" href="http://github.com/tobami/codespeed/blob/0.6.1/speedcenter/settings.py">speedcenter/settings.py</a> file.</p>
<h3>Codespeed Setup/Settings</h3>
<p>Now for setting up the actual Codespeed server.  First check it out using git.  The clone command is:</p>
<pre>git clone http://github.com/tobami/codespeed.git</pre>
<p>The settings file is <a title="speedcenter/codespeed/settings.py" href="http://github.com/tobami/codespeed/blob/0.6.1/speedcenter/codespeed/settings.py">speedcenter/codespeed/settings.py</a>.</p>
<p>Most of the default values will work fine.  They&#8217;re mostly for setting default values for various things in the interface.</p>
<p>One thing that does need to be configured is the environment.  Start by running the syncdb command and then run the server using runserver.  Now that the server is running, browse to the admin interface.  If you ran the server on port 9000, point your browser at http://127.0.0.1:9000/admin.  Login using the username and password you created during the syncdb call.  A Codespeed environment must be created manually.  The environment is the machine you&#8217;re running the benchmarks on.  After logging in, click Add next to the Environment label.  Fill in the various fields and remember the name of it.  Save it when you&#8217;re done.  The name will be used later when submitting benchmark data to Codespeed.</p>
<h2>Submitting Benchmarks</h2>
<p>This will pick up where my last <a title="Nightly Benchmarks: Setting up Hudson" href="http://blog.quibb.org/2010/04/nightly-benchmarks-setting-up-hudson/">tutorial</a> left off.  The benchmarks were running as a nightly job in Hudson.  Sending benchmark data to Codespeed will take a bit of programming.  I&#8217;m going to continue the example with <a title="JRuby" href="http://jruby.org/">JRuby</a>, so the benchmarks and submission process are written in Ruby.</p>
<p>In order to submit benchmarks information must be transferred from the JRuby build job to the Ruby Benchmarks job.  My last post discussed how to transfer parameters between jobs.  Using the <a title="Parameterized Trigger Plugin" href="http://wiki.hudson-ci.org/display/HUDSON/Parameterized+Trigger+Plugin">Parameterized Trigger Plugin</a> and passing extra parameters using a properties file will allow you to get all the necessary parameters to the benchmarks job.</p>
<p>The required information for submitting a benchmark result to Codespeed includes:</p>
<ul>
<li>commitid &#8211; The id of the commit, which could either be a git/mercurial hashcode or an svn revision number.</li>
<li>project &#8211; The name of the project to save.</li>
<li>executable &#8211; The name of the executable.</li>
<li>benchmark &#8211; The name of the benchmark.</li>
<li>environment &#8211; This is the name of the environment you created earlier.  It must be the name of an existing environment.</li>
<li>result_value &#8211; The runtime of the benchmark. You can configure what units a benchmark has through the admin interface. Default is seconds.</li>
</ul>
<p>This information can be included but is optional:</p>
<ul>
<li>std_dev &#8211; The standard deviation of the results of the benchmarks.</li>
<li>min</li>
<li>max</li>
<li>branch &#8211; The branch corresponding to this benchmark in the SCM repository.</li>
<li>result_date &#8211; The timestamp of the commit in the form &#8220;%Y-%m-%d %H:%M&#8221;</li>
</ul>
<p>The above information is passed to Codespeed through an encoded URL.  Have the URL point to http://127.0.0.1:9000/results/add/ and encode the parameters for sending.  For the JRuby benchmarks, the following parameters are sent from the JRuby job to the to the ruby benchmarks job.</p>
<pre>COMMIT_ID=$(git rev-parse HEAD)
COMMIT_TIME=$(git log -1 --pretty=\"format:%ad\")
RUBY_PATH=$WORKSPACE/bin/jruby
REPO_URL=git://github.com/jruby/jruby.git</pre>
<p>The other fields are derived from the benchmarks job itself.</p>
<p>Here is the source code for <a title="submission through Ruby rake file" href="http://github.com/qbproger/ruby-benchmark-suite/blob/master/rakelib/bench.rake">submission through Ruby</a>:</p>

<div class="wp_syntax"><div class="code"><pre class="ruby" style="font-family:monospace;">output = <span style="color:#006600; font-weight:bold;">&#123;</span><span style="color:#006600; font-weight:bold;">&#125;</span>
canonical_name = doc<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#996600;">&quot;name&quot;</span><span style="color:#006600; font-weight:bold;">&#93;</span>.<span style="color:#CC0066; font-weight:bold;">gsub</span> <span style="color:#996600;">'//'</span>, <span style="color:#996600;">'/'</span>
output<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#996600;">'commitid'</span><span style="color:#006600; font-weight:bold;">&#93;</span> = commitid
output<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#996600;">'project'</span><span style="color:#006600; font-weight:bold;">&#93;</span> = BASE_VM
output<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#996600;">'branch'</span><span style="color:#006600; font-weight:bold;">&#93;</span> = branch
output<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#996600;">'executable'</span><span style="color:#006600; font-weight:bold;">&#93;</span> = BASE_VM
output<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#996600;">'benchmark'</span><span style="color:#006600; font-weight:bold;">&#93;</span> = <span style="color:#CC00FF; font-weight:bold;">File</span>.<span style="color:#9900CC;">basename</span><span style="color:#006600; font-weight:bold;">&#40;</span>canonical_name<span style="color:#006600; font-weight:bold;">&#41;</span>
output<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#996600;">'environment'</span><span style="color:#006600; font-weight:bold;">&#93;</span> = environment
output<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#996600;">'result_value'</span><span style="color:#006600; font-weight:bold;">&#93;</span> = doc<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#996600;">&quot;mean&quot;</span><span style="color:#006600; font-weight:bold;">&#93;</span>
output<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#996600;">'std_dev'</span><span style="color:#006600; font-weight:bold;">&#93;</span> = doc<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#996600;">&quot;standard_deviation&quot;</span><span style="color:#006600; font-weight:bold;">&#93;</span>
output<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#996600;">'result_date'</span><span style="color:#006600; font-weight:bold;">&#93;</span> = commit_time
&nbsp;
res = <span style="color:#6666ff; font-weight:bold;">Net::HTTP</span>.<span style="color:#9900CC;">post_form</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#CC00FF; font-weight:bold;">URI</span>.<span style="color:#9900CC;">parse</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#996600;">&quot;#{server}/result/add/&quot;</span><span style="color:#006600; font-weight:bold;">&#41;</span>, output<span style="color:#006600; font-weight:bold;">&#41;</span>
<span style="color:#CC0066; font-weight:bold;">puts</span> res.<span style="color:#9900CC;">body</span></pre></div></div>

<p>It&#8217;s a good idea to always print out the response as it will contain debug information.  There is an example of <a title="save_single_result.py" href="http://github.com/tobami/codespeed/blob/0.6.1/tools/save_single_result.py">how to submit benchmarks to Codespeed using Python</a> in the Codespeed repository in the tools directory.</p>
<h2>Viewing Results</h2>
<p>After results are in the the Codespeed database, you can view the data through the web interface.  Direct a browser at http://127.0.0.1:9000.  The changes view shows the trend over the last revisions.  The timeline view allows you to see a graph of recent revisions, and the newly added comparison view will compare different executables running the same benchmark.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.quibb.org/2010/07/nightly-benchmarks-tracking-results-with-codespeed/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Nightly Benchmarks: Setting up Hudson</title>
		<link>http://blog.quibb.org/2010/04/nightly-benchmarks-setting-up-hudson/</link>
		<comments>http://blog.quibb.org/2010/04/nightly-benchmarks-setting-up-hudson/#comments</comments>
		<pubDate>Thu, 08 Apr 2010 13:36:27 +0000</pubDate>
		<dc:creator>Joe</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[benchmarks]]></category>
		<category><![CDATA[continuous integration]]></category>
		<category><![CDATA[jruby]]></category>

		<guid isPermaLink="false">http://blog.quibb.org/?p=169</guid>
		<description><![CDATA[For some projects, finding out about performance regressions is important.  I&#8217;m going to write a two part series about setting up a nightly build machine and displaying the generated data.  This part is going to cover installation of Hudson, and getting the benchmarks running nightly. I decided to give Hudson a try because I had [...]]]></description>
			<content:encoded><![CDATA[<p>For some projects, finding out about performance regressions is important.  I&#8217;m going to write a two part series about setting up a nightly build machine and displaying the generated data.  This part is going to cover installation of <a title="Hudson" href="http://hudson-ci.org/">Hudson</a>, and getting the benchmarks running nightly.</p>
<p>I decided to give Hudson a try because I had heard good things about it.  Also after hearing coworkers complain about cruise control and cdash, I thought I&#8217;d try something new.  Since Hudson has pretty extensive documentation, I&#8217;ll walk you through setting up the JRuby project to build with Hudson and getting benchmarks running on it.</p>
<h2>Hudson Installation</h2>
<p>On Ubuntu it&#8217;s as simple as:</p>
<pre>sudo apt-get install hudson
</pre>
<p>While I didn&#8217;t install it on windows, the installation should require little more than installing <a title="Tomcat" href="http://tomcat.apache.org/">Tomcat</a> and then downloading the Hudson war file and put it in the web-apps directory.</p>
<p>After installation browsing to http://127.0.0.1:8080 should show the Hudson Dashboard.</p>
<h3>Hudson Configuration</h3>
<p>After Hudson installation is complete, it requires very little configuration before setting up your first project.  One thing that may be necessary is going to the plugins page and making sure your version control system is covered.  For setting up a continuous integration machine to build JRuby, the git plugin is necessary.</p>
<p>To install the Hudson Git Plugin, click <strong>Manage Hudson</strong> on the left hand side.  Then click <strong>Manage Plugins</strong> from the list in the middle of the screen.  Click the <strong>Available</strong> tab, and find the <span style="text-decoration: underline;">Hudson GIT plugin</span> in the list.  After it&#8217;s installed it will show up in the Installed tab.</p>
<p>After installing all the necessary plugins for your project go back to the Hudson Dashboard by clicking the Hudson logo, or the Back to Dashboard link.</p>
<h2>Setting up a Project to Build</h2>
<p>A good first step it to make sure the project will build on the given machine without being built through Hudson.  There may be some dependencies that got overlooked, and this is a good way to make sure everything is setup to build your project.</p>
<p>Now, click on the <strong>New Job</strong> link on the left hand side.  For the JRuby project, the <strong>Build a free-style software project</strong> is the type of project to setup.  I imagine that is the correct type of project to setup for most projects.</p>
<p>Unless you plan on keeping all the builds produced on the server, the <strong>Discard Old Builds</strong> is a good option to check, and set how long you want the builds to remain on the server.  Choose the source code management tool that you use for your project, which is Git for JRuby, and set the appropriate settings.</p>
<p>JRuby settings:</p>
<pre>URL of Repository: git://github.com/jruby/jruby.git
Branch Specifier (blank for default): master
Repository browser (Auto)
</pre>
<p>There are several types of Build Triggers by default.  More Build Triggers can be added through plugins, if you&#8217;re looking for another way to trigger a build.  For a nightly build at midnight select the <strong>Build periodically</strong> option, and put <em>@midnight</em> in the field.</p>
<p>For the build step, if you&#8217;re building a Java project select <strong>Invoke Ant</strong>.  Otherwise, <strong>Execute shell</strong> may be a good option for you.  For JRuby, select Invoke Ant and set the target to jar to build it.</p>
<p>At this point you can click the <strong>Save</strong> button at the bottom of the page and click <strong>Build Now</strong> on the next page to build your project.  It&#8217;s a good idea to make sure your project builds correctly before trying to add in nightly benchmarks.  It&#8217;s easier to debug problems before you have too much going on.  By clicking on the build from the active builds list the console output can be seen from the browser.</p>
<h2>Running the Benchmarks</h2>
<p>If your benchmarks are in the same repository, you&#8217;re mostly done.  Add another build step, and set it up to run your benchmarks.  While JRuby does have benchmarks in its repository, the benchmarks I plan on running are in a different repository.  With this goal in mind, I created another Job in Hudson to checkout and run the benchmarks.</p>
<p>Its setup is very similar to that of JRuby, it checks out the source and runs the benchmarks.  The main difference is that a parameter is passed to the project to tell it which Ruby VM to use.  The <span style="text-decoration: underline;">Parameterized Trigger Plugin</span> is necessary to pass a parameter from one project to another.  The way it works is you set a parameter in the project receiving the parameter near the top of the page.  In my case, I added a RUBY_PATH parameter.  Then you setup the build job to send that parameter to the benchmarks job.</p>
<p>To do this, I went back to the JRuby job and turned on the <strong>Trigger parameterized build on other projects</strong> option.  It should be the last option down at the bottom of the page.  I set the JRuby job to trigger with the benchmarks job name, and in the predefined parameters field I put the following:</p>
<pre> RUBY_PATH=$WORKSPACE/bin/jruby</pre>
<p>After this is in place, when a JRuby build finishes it will start a benchmarks run.  Now that your benchmarks are up and running, the next part to this series will go over how to display the information in a way that makes it easy to spot regressions.</p>
<p>If you have any questions or if I went over something too quickly, post a comment and/or ask a question.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.quibb.org/2010/04/nightly-benchmarks-setting-up-hudson/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Sort Optimization (Part 2) with JDK 6 vs JDK 7</title>
		<link>http://blog.quibb.org/2009/12/sort-optimization-part-2-with-jdk-6-vs-jdk-7/</link>
		<comments>http://blog.quibb.org/2009/12/sort-optimization-part-2-with-jdk-6-vs-jdk-7/#comments</comments>
		<pubDate>Wed, 23 Dec 2009 15:00:28 +0000</pubDate>
		<dc:creator>Joe</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[benchmarks]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[shootout]]></category>
		<category><![CDATA[sorting]]></category>

		<guid isPermaLink="false">http://blog.quibb.org/?p=94</guid>
		<description><![CDATA[In part 1, I went over my first foray into the world of sorting algorithms.  Since then, I&#8217;ve had some other ideas on how to improve my quicksort implementation.  One idea that I had while originally working on the sorting algorithm, was to rework the partition function to take into account duplicate elements.  I had [...]]]></description>
			<content:encoded><![CDATA[<p>In <a href="http://blog.quibb.org/2008/11/sort-optimization/">part 1</a>, I went over my first foray into the world of sorting algorithms.  Since then, I&#8217;ve had some other ideas on how to improve my quicksort implementation.  One idea that I had while originally working on the sorting algorithm, was to rework the partition function to take into account duplicate elements.  I had a few different working implementations, but all of them came with severe performance penalty.  I finally figured out a way to get performance close to the previous algorithm.</p>
<p>The partition function needs to perform the minimal number of swaps possible.  So moving towards the center from both ends and only swapping when both are out of order is the best approach I&#8217;ve found so far.  When grouping duplicate elements, they are swapped to the beginning of the partition area as they are found.  Then at the end, a pass is run to move them to their correct location in the final list.  Then instead of returning one number from the partition function, it returns two.  It returns the minimum and maximum indices on the range that has the pivot value.</p>
<p>Another area that I was able to get some performance gain out of was getting rid of the shell sort form the first algorithm.  While that was there to make sure the quicksort did not recurse too deeply, in practice the shell sort algorithm doesn&#8217;t run.</p>
<p><strong>Results</strong></p>
<p>Here are the results of JDK 6 MergeSort, <a href="http://en.wikipedia.org/wiki/Timsort">Tim Sort</a>, <a href="http://blog.quibb.org/2008/11/sort-optimization/">QSort</a>, QSortv2, and Dual Pivot sort 2 benchmarked on the same set of files.  Overall, the new version doesn&#8217;t outperform the old version, but I thought it was worth posting my findings.  On most data sets with duplicates it does perform better.  I ran these benchmarks on OpenJDK 7 because I was curious as to how they would compare to one another.</p>
<p>It&#8217;s important to note that the tables are speedup relative the Java implementation on the given JDK.  The graphs are the average runtimes for each algorithm.  The reason for doing the average runtime is that it could show the performance difference between Sun&#8217;s JDK 6 and OpenJDK 7 build 73.<br />
<center></p>
<table>
<tbody>
<tr>
<td>
<div id="attachment_105" class="wp-caption alignnone" style="width: 122px"><a href="http://blog.quibb.org/wp-content/uploads/2009/12/JDK6nowarm.png"><img class="size-medium wp-image-105" title="Sun JDK 6 without Warmup" src="http://blog.quibb.org/wp-content/uploads/2009/12/JDK6nowarm-112x300.png" alt="Sun JDK 6 without Warmup" width="112" height="300" /></a><p class="wp-caption-text">Sun JDK 6 without Warmup</p></div></td>
<td>
<p><div id="attachment_106" class="wp-caption alignnone" style="width: 106px"><a href="http://blog.quibb.org/wp-content/uploads/2009/12/JDK7nowarm.png"><img class="size-medium wp-image-106 " title="OpenJDK 7 without Warmup" src="http://blog.quibb.org/wp-content/uploads/2009/12/JDK7nowarm-96x300.png" alt="Sun JDK 7 without Warmup" width="96" height="300" /></a><p class="wp-caption-text">OpenJDK 7 without Warmup</p></div></td>
</tr>
</tbody>
</table>
<p></center><br />
<center></p>
<table>
<tbody>
<tr>
<td>
<p><div id="attachment_107" class="wp-caption alignnone" style="width: 174px"><a href="http://blog.quibb.org/wp-content/uploads/2009/12/JDK61000warm.png"><img class="size-medium wp-image-107" title="Sun JDK 6 1000 Warmup Iterations" src="http://blog.quibb.org/wp-content/uploads/2009/12/JDK61000warm-164x300.png" alt="Sun JDK 6 1000 Warmup Iterations" width="164" height="300" /></a><p class="wp-caption-text">Sun JDK 6 1000 Warmup Iterations</p></div></td>
<td>
<p><div id="attachment_108" class="wp-caption alignnone" style="width: 151px"><a href="http://blog.quibb.org/wp-content/uploads/2009/12/JDK71000warm.png"><img class="size-medium wp-image-108" title="OpenJDK 7 1000 Warmup Iterations" src="http://blog.quibb.org/wp-content/uploads/2009/12/JDK71000warm-141x300.png" alt="OpenJDK 7 1000 Warmup Iterations" width="141" height="300" /></a><p class="wp-caption-text">OpenJDK 7 1000 Warmup Iterations</p></div></td>
</tr>
</tbody>
</table>
<p></center><br />
<div id="attachment_109" class="wp-caption aligncenter" style="width: 310px"><a href="http://blog.quibb.org/wp-content/uploads/2009/12/nowarmup.png"><img class="size-medium wp-image-109  " title="Sun JDK 6 vs OpenJDK 7 without Warmup" src="http://blog.quibb.org/wp-content/uploads/2009/12/nowarmup-300x249.png" alt="JDK 6 vs JDK 7 with No Warmup" width="300" height="249" /></a><p class="wp-caption-text">Sun JDK 6 vs OpenJDK 7 without Warmup</p></div>
<div id="attachment_104" class="wp-caption aligncenter" style="width: 310px"><a href="http://blog.quibb.org/wp-content/uploads/2009/12/1000warm.png"><img class="size-medium wp-image-104 " title="Sun JDK 6 vs OpenJDK 7 1000 Iterations of Warmup" src="http://blog.quibb.org/wp-content/uploads/2009/12/1000warm-300x248.png" alt="Sun JDK 6 vs OpenJDK 7 1000 Warmup Iterations" width="300" height="248" /></a><p class="wp-caption-text">Sun JDK 6 vs OpenJDK 7 1000 Iterations of Warmup</p></div>
<p><strong>Conclusions</strong></p>
<p>Overall the new version of the Qsort implementation doesn&#8217;t improve greatly over the previous implementation.  While it didn&#8217;t work out to be the performace improvement I was looking for.  I think the last graph with 1000 iterations of warmup for each algorithm is the most interesting.  The Qsort v2 implementation apparently doesn&#8217;t get handled any better by OpenJDK 7.  The partition function is larger after my changes, so perhaps it didn&#8217;t JIT very well.  What is interesting is the boost that Tim Sort saw with the change of JDK&#8217;s.  Running these benchmarks made me realize that upgrading my Java Runtime will increase the performance of all my Java applications.  It will be interesting to see if the performance carries over to Netbeans and Eclipse; I expect it will.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.quibb.org/2009/12/sort-optimization-part-2-with-jdk-6-vs-jdk-7/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>NLTK vs MontyLingua Part of Speech Taggers</title>
		<link>http://blog.quibb.org/2009/03/nltk-vs-montylingua-part-of-speech-taggers/</link>
		<comments>http://blog.quibb.org/2009/03/nltk-vs-montylingua-part-of-speech-taggers/#comments</comments>
		<pubDate>Sun, 29 Mar 2009 02:23:50 +0000</pubDate>
		<dc:creator>Joe</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[benchmarks]]></category>
		<category><![CDATA[nlp]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[taggers]]></category>

		<guid isPermaLink="false">http://blog.quibb.org/?p=37</guid>
		<description><![CDATA[This is a comparison of the part of speech taggers available in python. As far as I know, these are the most prominent python taggers. Let me know if you think another tagger should be added to the comparison. MontyLingua includes several natural language processing (NLP) tools. The ones that I used in this comparison [...]]]></description>
			<content:encoded><![CDATA[<p>This is a comparison of the part of speech taggers available in python.  As far as I know, these are the most prominent python taggers.  Let me know if you think another tagger should be added to the comparison.</p>
<p><a href="http://web.media.mit.edu/~hugo/montylingua/">MontyLingua</a> includes several natural language processing (NLP) tools.  The ones that I used in this comparison were the stemmer, tagger, and sentence tokenizer.  <a href="http://www.nltk.org/">The Natural Language Toolkit (NLTK)</a> is another set of python tools for natural language processing.  It has a much greater breadth of tools than MontyLingua.  It has taggers, parsers, tokenizers, chunkers, and stemmers.  It usually has a few different implementations of each providing different options to their users.  In the case of stemmers, they have the Punkt and WordNet stemmers.  Both of these tools are written to aid in NLP using Python.</p>
<h2 style="text-align: left;"><strong>Taggers</strong></h2>
<p style="text-align: left;">For those that don&#8217;t know, a tagger is a NLP tool that will mark the part of speech of a word.</p>
<p>Example:<br />
Input: &#8220;A dog walks&#8221;<br />
Output: &#8220;A/DT dog/NN walks/VBZ&#8221;</p>
<p>The meanings of the tokens after the / can be <a href="http://en.wikipedia.org/wiki/Brown_Corpus">found here</a>.</p>
<p>For NLTK, I&#8217;m comparing the built-in tagger to MontyLingua.  I didn&#8217;t do any training at all and just called nltk.tag.pos_tag().  I used the taggers mostly as is, with some slight modifications.  I added a RegExp tagger in front of the NLTK tagger, and make the default tagger the backoff tagger.  It will mark A, An, and The as DT always.  It was annoying and messing up my results to have them marked as NNP.  They were capitalized, and I suppose the tagger thought they were either initials or proper names.</p>
<p>MontyLingua on the other hand was always marking &#8220;US&#8221; as a pronoun.  This was a problem when scanning sentences that said &#8220;US Pint&#8221; or &#8220;US Gallon.&#8221;  I look at the word before &#8220;US&#8221; and see if it&#8217;s an article, if it is I allow it to continue being processed.  Neither tagger is perfect, but it becomes clear that one may be better than the other for my use-case.  It may be different for yours.  I&#8217;m scanning sentences from the web.</p>
<h2 style="text-align: left;"><strong>Stemmers</strong></h2>
<p style="text-align: left;">A stemmer is a tool that will take a word with a suffix attached to it, and return the &#8216;stem&#8217; or base word of it.</p>
<p>Example:<br />
Input: dogs<br />
Output: dog</p>
<p>While neither stemmer is perfect, they both do a decent job.  MontyLingua is more inclined to take the &#8216;S&#8217; off the end of something, and the NLTK WordNetLemmatizer doesn&#8217;t always take it off.  &#8216;Cows&#8217; is an example of a word the WordNetLemmatizer will not stem to &#8216;Cow&#8217; but MontyLingua will.  On the other hand, MontyLingua is more likely to take the &#8216;S&#8217; off the end of an acronym, and I wrote code to correct that in some cases.  If a word is less than 4 characters or all consonants, I don&#8217;t run it on the MontyLingua stemmer.  The all consonants is to catch some acronyms.  While using MontyLingua on a specific part of speech it&#8217;s important to specify whether it&#8217;s a <em>noun</em> or a <em>verb</em> with the &#8216;pos&#8217; parameter.  Since I&#8217;m only stemming nouns, I used pos=&#8217;noun&#8217;.</p>
<h2 style="text-align: left;"><strong>Results</strong></h2>
<p>The first results don&#8217;t only reflect a change in taggers, but changes in the stemmer and sentence tokenizer also.  I did another comparison using the MontyLingua tagger with the NLTK stemmer and sentence tokenizer for comparison.</p>
<p>A phrase found by one algorithm and not by another is shown first.  They both were able to find some words that were not found by the other.  Hits is the number of times a phrase comes up, it is displayed only if there is a discrepancy.  If MontyLingua and NLTK both found a phrase but found it a different number of times, that is reflected there.  The first numbers are totals for every discrepancy summed.  There is also a graph below showing how many of each difference there is.  For example there were 157 times that there was a discrepancy of 1 hit and MontyLingua came out on top.  There were 78 times the number of hits were different by 1 and NLTK had more.  An interesting one is there was one time MontyLingua had one word with 40 hits more than NLTK.  That word was elephant.</p>
<p>MontyLingua toolchain vs NLTK toolchain<br />
In MontyLingua but not NLTK: 514<br />
In NLTK but not MontyLingua: 403</p>
<p>Total Hits: MontyLingua: 1421 vs NLTK: 1184</p>
<p style="text-align: center;">
<div id="attachment_38" class="wp-caption aligncenter" style="width: 310px"><a href="http://blog.quibb.org/wp-content/uploads/2009/03/monty_v_nltk.png"><img class="size-medium wp-image-38" title="monty_v_nltk" src="http://blog.quibb.org/wp-content/uploads/2009/03/monty_v_nltk-300x266.png" alt="MontyLinga vs NLTK Graph" width="300" height="266" /></a><p class="wp-caption-text">MontyLingua vs NLTK</p></div>
<table style="height: 222px;" border="1" cellspacing="0" cellpadding="3" width="260" align="center">
<tbody>
<tr>
<td>Hit Count</td>
<td>MontyLingua</td>
<td>NLTK</td>
</tr>
<tr>
<td>1</td>
<td>157</td>
<td>78</td>
</tr>
<tr>
<td>2</td>
<td>35</td>
<td>10</td>
</tr>
<tr>
<td>3</td>
<td>10</td>
<td>0</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>1</td>
</tr>
<tr>
<td>5</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>6</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>13</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>14</td>
<td>2</td>
<td>0</td>
</tr>
<tr>
<td>40</td>
<td>1</td>
<td>0</td>
</tr>
</tbody>
</table>
<p>On average MontyLingua had more hits than NLTK on words</p>
<p>MontyLingua Tagger NLTK Stemmer &amp; Tokenizer (ML-NLTK) vs MontyLingua Toolchain<br />
For the sake of completeness here are the results of the MontyLingua tagger with the NLTK stemmer and tokenizer.</p>
<p>In ML-NLTK but not in MontyLingua: 65<br />
In MontyLingua but not in ML-NLTK: 68</p>
<p>Total Hits: ML-NLTK: 290 vs MontyLingua: 299</p>
<table style="height: 90px;" border="1" cellspacing="0" cellpadding="3" width="260" align="center">
<tbody>
<tr>
<td>Hit Count</td>
<td>MontyLingua</td>
<td>ML-NLTK</td>
</tr>
<tr>
<td>1</td>
<td>20</td>
<td>17</td>
</tr>
<tr>
<td>2</td>
<td>0</td>
<td>2</td>
</tr>
<tr>
<td>10</td>
<td>1</td>
<td>0</td>
</tr>
</tbody>
</table>
<p style="text-align: center;">
<p style="text-align: center;"><span style="text-decoration: underline;">Total Phrases Found By</span></p>
<table style="height: 90px;" border="1" cellspacing="0" cellpadding="3" width="260" align="center">
<tbody>
<tr>
<td>Name</td>
<td>Phrase Count</td>
</tr>
<tr>
<td>NLTK</td>
<td>3777</td>
</tr>
<tr>
<td>ML-NLTK</td>
<td>3885</td>
</tr>
<tr>
<td>MontyLingua</td>
<td>3888</td>
</tr>
</tbody>
</table>
<p>At the end of the day, I&#8217;ll be using the MontyLingua toolchain with some slight modifications I&#8217;ve made (mentioned above).  I&#8217;m definitely still using NLTK, just for different tasks.  NLTK has a great and easy to use regexp chunker that I&#8217;ll continue to use.</p>
<p>Again, a tagger&#8217;s performace can vary greatly based on the data used to train and test it.  I was testing them on about 12,000 webpages I downloaded and looking for specific phrases.  On a different data set NLTK may turn out to be better.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.quibb.org/2009/03/nltk-vs-montylingua-part-of-speech-taggers/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
	</channel>
</rss>
