<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Blog::Quibb &#187; optimization</title>
	<atom:link href="http://blog.quibb.org/tag/optimization/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.quibb.org</link>
	<description>Software development and more.</description>
	<lastBuildDate>Tue, 10 Aug 2010 14:11:56 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Fast Bulk Inserts into SQLite</title>
		<link>http://blog.quibb.org/2010/08/fast-bulk-inserts-into-sqlite/</link>
		<comments>http://blog.quibb.org/2010/08/fast-bulk-inserts-into-sqlite/#comments</comments>
		<pubDate>Tue, 10 Aug 2010 14:11:56 +0000</pubDate>
		<dc:creator>Joe</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[benchmarks]]></category>
		<category><![CDATA[c++]]></category>
		<category><![CDATA[optimization]]></category>
		<category><![CDATA[sqlite]]></category>

		<guid isPermaLink="false">http://blog.quibb.org/?p=219</guid>
		<description><![CDATA[Background Sometimes it’s necessary to get information into a database quickly. SQLite is a light weight database engine that can be easily embedded in applications. This will cover the process of optimizing bulk inserts into an SQLite database. While this article focuses on SQLite some of the techniques shown here will apply to other databases. [...]]]></description>
			<content:encoded><![CDATA[<h2>Background</h2>
<p>Sometimes it’s necessary to get information into a database quickly. <a title="SQLite" href="http://sqlite.org/"> SQLite</a> is a light weight database engine that can be easily embedded in applications.  This will cover the process of optimizing bulk inserts into an SQLite database.  While this article focuses on SQLite some of the techniques shown here will apply to other databases.</p>
<p>All of the following examples insert data into the same table.  It&#8217;s a table where an ID is the first element followed by three FLOAT values, and then follow by three INTEGER values.  You&#8217;ll notice the getDouble() and getInt() functions.  They return doubles and ints in a predictable manner.  I didn&#8217;t use random data because different values could potentially add variability to the benchmarks at the end.</p>
<h2>Naive Inserts</h2>
<p>This is the most basic way to insert information into SQLite.  It simply calls <a title="sqlite3_exec" href="http://www.sqlite.org/c3ref/exec.html">sqlite3_exec</a> for each insert in the database.</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">char</span> buffer<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">300</span><span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
<span style="color: #0000ff;">for</span> <span style="color: #008000;">&#40;</span><span style="color: #0000ff;">unsigned</span> i <span style="color: #000080;">=</span> <span style="color: #0000dd;">0</span><span style="color: #008080;">;</span> i <span style="color: #000080;">&lt;</span> mVal<span style="color: #008080;">;</span> i<span style="color: #000040;">++</span><span style="color: #008000;">&#41;</span>
<span style="color: #008000;">&#123;</span>
    <span style="color: #0000dd;">sprintf</span><span style="color: #008000;">&#40;</span>buffer, <span style="color: #FF0000;">&quot;INSERT INTO example VALUES ('%s', %lf, %lf, %lf, %d, %d, %d)&quot;</span>,
            getID<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>.<span style="color: #007788;">c_str</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>, getDouble<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>, getDouble<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>, getDouble<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>,
            getInt<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>, getInt<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>, getInt<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    sqlite3_exec<span style="color: #008000;">&#40;</span>mDb, buffer, <span style="color: #0000ff;">NULL</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #0000ff;">NULL</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span></pre></div></div>

<h2>Inserts within a Transaction</h2>
<p>A transaction is a way to group SQL statements together.  If an error is encountered the ON CONFLICT statement can be used to handle that to your liking.  Nothing will be written to the SQLite database until either END or COMMIT is encountered to signify the transaction should be closed and written.</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">char</span><span style="color: #000040;">*</span> errorMessage<span style="color: #008080;">;</span>
sqlite3_exec<span style="color: #008000;">&#40;</span>mDb, <span style="color: #FF0000;">&quot;BEGIN TRANSACTION&quot;</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #000040;">&amp;</span>errorMessage<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
<span style="color: #0000ff;">char</span> buffer<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">300</span><span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
<span style="color: #0000ff;">for</span> <span style="color: #008000;">&#40;</span><span style="color: #0000ff;">unsigned</span> i <span style="color: #000080;">=</span> <span style="color: #0000dd;">0</span><span style="color: #008080;">;</span> i <span style="color: #000080;">&lt;</span> mVal<span style="color: #008080;">;</span> i<span style="color: #000040;">++</span><span style="color: #008000;">&#41;</span>
<span style="color: #008000;">&#123;</span>
    <span style="color: #0000dd;">sprintf</span><span style="color: #008000;">&#40;</span>buffer, <span style="color: #FF0000;">&quot;INSERT INTO example VALUES ('%s', %lf, %lf, %lf, %d, %d, %d)&quot;</span>,
            getID<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>.<span style="color: #007788;">c_str</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>, getDouble<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>, getDouble<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>, getDouble<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>,
            getInt<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>, getInt<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>, getInt<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    sqlite3_exec<span style="color: #008000;">&#40;</span>mDb, buffer, <span style="color: #0000ff;">NULL</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #0000ff;">NULL</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span>
&nbsp;
sqlite3_exec<span style="color: #008000;">&#40;</span>mDb, <span style="color: #FF0000;">&quot;COMMIT TRANSACTION&quot;</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #000040;">&amp;</span>errorMessage<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span></pre></div></div>

<h2>PRAGMA Statements</h2>
<p><a title="PRAGMA" href="http://sqlite.org/pragma.html">PRAGMA</a> statements control the behavior of SQLite as a whole.  They can be used to tweak options such as how often the data is flushed to disk of the size of the cache.  These are some that are commonly used for performance.  The SQLite documentation fully explains what they do and the implications of using them.  For example, synchronous off will cause SQLite to not stop and wait for the data to get written to the hard drive.  In the event of a crash or power failure, it is more likely the database could be corrupted.</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;">sqlite3_exec<span style="color: #008000;">&#40;</span>mDb, <span style="color: #FF0000;">&quot;PRAGMA synchronous=OFF&quot;</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #000040;">&amp;</span>errorMessage<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
sqlite3_exec<span style="color: #008000;">&#40;</span>mDb, <span style="color: #FF0000;">&quot;PRAGMA count_changes=OFF&quot;</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #000040;">&amp;</span>errorMessage<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
sqlite3_exec<span style="color: #008000;">&#40;</span>mDb, <span style="color: #FF0000;">&quot;PRAGMA journal_mode=MEMORY&quot;</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #000040;">&amp;</span>errorMessage<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
sqlite3_exec<span style="color: #008000;">&#40;</span>mDb, <span style="color: #FF0000;">&quot;PRAGMA temp_store=MEMORY&quot;</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #000040;">&amp;</span>errorMessage<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span></pre></div></div>

<h2>Prepared Statements</h2>
<p><a title="Prepared Statements" href="http://sqlite.org/c3ref/prepare.html">Prepared statements</a> are the recommended way of sending queries to SQLite.   Rather than parsing the statement over and over again, the parser only needs to be run once on the statement.  According to the documentation, sqlite3_exec is a convenience function  that calls sqlite3_prepare_v2(), sqlite3_step(), and then  sqlite3_finalize().  In my opinion, the documentation should more explicitly say that prepared statements are the preferred query method.  sqlite3_exec() should only be used for one time use queries.</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">char</span><span style="color: #000040;">*</span> errorMessage<span style="color: #008080;">;</span>
sqlite3_exec<span style="color: #008000;">&#40;</span>mDb, <span style="color: #FF0000;">&quot;BEGIN TRANSACTION&quot;</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #000040;">&amp;</span>errorMessage<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
<span style="color: #0000ff;">char</span> buffer<span style="color: #008000;">&#91;</span><span style="color: #008000;">&#93;</span> <span style="color: #000080;">=</span> <span style="color: #FF0000;">&quot;INSERT INTO example VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7)&quot;</span><span style="color: #008080;">;</span>
sqlite3_stmt<span style="color: #000040;">*</span> stmt<span style="color: #008080;">;</span>
sqlite3_prepare_v2<span style="color: #008000;">&#40;</span>mDb, buffer, <span style="color: #0000dd;">strlen</span><span style="color: #008000;">&#40;</span>buffer<span style="color: #008000;">&#41;</span>, <span style="color: #000040;">&amp;</span>stmt, <span style="color: #0000ff;">NULL</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
<span style="color: #0000ff;">for</span> <span style="color: #008000;">&#40;</span><span style="color: #0000ff;">unsigned</span> i <span style="color: #000080;">=</span> <span style="color: #0000dd;">0</span><span style="color: #008080;">;</span> i <span style="color: #000080;">&lt;</span> mVal<span style="color: #008080;">;</span> i<span style="color: #000040;">++</span><span style="color: #008000;">&#41;</span>
<span style="color: #008000;">&#123;</span>
    std<span style="color: #008080;">::</span><span style="color: #007788;">string</span> id <span style="color: #000080;">=</span> getID<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    sqlite3_bind_text<span style="color: #008000;">&#40;</span>stmt, <span style="color: #0000dd;">1</span>, id.<span style="color: #007788;">c_str</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>, id.<span style="color: #007788;">size</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>, SQLITE_STATIC<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    sqlite3_bind_double<span style="color: #008000;">&#40;</span>stmt, <span style="color: #0000dd;">2</span>, getDouble<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    sqlite3_bind_double<span style="color: #008000;">&#40;</span>stmt, <span style="color: #0000dd;">3</span>, getDouble<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    sqlite3_bind_double<span style="color: #008000;">&#40;</span>stmt, <span style="color: #0000dd;">4</span>, getDouble<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    sqlite3_bind_int<span style="color: #008000;">&#40;</span>stmt, <span style="color: #0000dd;">5</span>, getInt<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    sqlite3_bind_int<span style="color: #008000;">&#40;</span>stmt, <span style="color: #0000dd;">6</span>, getInt<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    sqlite3_bind_int<span style="color: #008000;">&#40;</span>stmt, <span style="color: #0000dd;">7</span>, getInt<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
    <span style="color: #0000ff;">if</span> <span style="color: #008000;">&#40;</span>sqlite3_step<span style="color: #008000;">&#40;</span>stmt<span style="color: #008000;">&#41;</span> <span style="color: #000040;">!</span><span style="color: #000080;">=</span> SQLITE_DONE<span style="color: #008000;">&#41;</span>
    <span style="color: #008000;">&#123;</span>
        <span style="color: #0000dd;">printf</span><span style="color: #008000;">&#40;</span><span style="color: #FF0000;">&quot;Commit Failed!<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    <span style="color: #008000;">&#125;</span>
&nbsp;
    sqlite3_reset<span style="color: #008000;">&#40;</span>stmt<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span>
&nbsp;
sqlite3_exec<span style="color: #008000;">&#40;</span>mDb, <span style="color: #FF0000;">&quot;COMMIT TRANSACTION&quot;</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #000040;">&amp;</span>errorMessage<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
sqlite3_finalize<span style="color: #008000;">&#40;</span>stmt<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span></pre></div></div>

<h2>Storing Data as Binary Blob</h2>
<p>Up until now, most of the optimizations have been pretty much the standard advice that you get when looking into bulk insert optimization.  If you’re not running queries on some of the data, it’s possible to convert it to binary and store it as a blob.  While it’s not advised to just throw everything into a blob and put it in the database, putting data that would be pulled and used together into a binary blob can make sense in some situations.</p>
<p>For example, if you have a point class (x, y, z) with REAL values, it might make sense to store them in a blob rather than three separate fields in row.  That’s only if you don’t need to make queries on the data though.  The benefit of this technique increases as more fields are converted into larger blobs.</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">char</span><span style="color: #000040;">*</span> errorMessage<span style="color: #008080;">;</span>
sqlite3_exec<span style="color: #008000;">&#40;</span>mDb, <span style="color: #FF0000;">&quot;BEGIN TRANSACTION&quot;</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #000040;">&amp;</span>errorMessage<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
<span style="color: #0000ff;">char</span> buffer<span style="color: #008000;">&#91;</span><span style="color: #008000;">&#93;</span> <span style="color: #000080;">=</span> <span style="color: #FF0000;">&quot;INSERT INTO example VALUES (?1, ?2, ?3, ?4, ?5)&quot;</span><span style="color: #008080;">;</span>
sqlite3_stmt<span style="color: #000040;">*</span> stmt<span style="color: #008080;">;</span>
sqlite3_prepare_v2<span style="color: #008000;">&#40;</span>mDb, buffer, <span style="color: #0000dd;">strlen</span><span style="color: #008000;">&#40;</span>buffer<span style="color: #008000;">&#41;</span>, <span style="color: #000040;">&amp;</span>stmt, <span style="color: #0000ff;">NULL</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
<span style="color: #0000ff;">for</span> <span style="color: #008000;">&#40;</span><span style="color: #0000ff;">unsigned</span> i <span style="color: #000080;">=</span> <span style="color: #0000dd;">0</span><span style="color: #008080;">;</span> i <span style="color: #000080;">&lt;</span> mVal<span style="color: #008080;">;</span> i<span style="color: #000040;">++</span><span style="color: #008000;">&#41;</span>
<span style="color: #008000;">&#123;</span>
    std<span style="color: #008080;">::</span><span style="color: #007788;">string</span> id <span style="color: #000080;">=</span> getID<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    sqlite3_bind_text<span style="color: #008000;">&#40;</span>stmt, <span style="color: #0000dd;">1</span>, id.<span style="color: #007788;">c_str</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>, id.<span style="color: #007788;">size</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>, SQLITE_STATIC<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
    <span style="color: #0000ff;">char</span> dblBuffer<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">24</span><span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
    <span style="color: #0000ff;">double</span> d<span style="color: #008000;">&#91;</span><span style="color: #008000;">&#93;</span> <span style="color: #000080;">=</span> <span style="color: #008000;">&#123;</span>getDouble<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>, getDouble<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>, getDouble<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span>
    <span style="color: #0000dd;">memcpy</span><span style="color: #008000;">&#40;</span>dblBuffer, <span style="color: #008000;">&#40;</span><span style="color: #0000ff;">char</span><span style="color: #000040;">*</span><span style="color: #008000;">&#41;</span><span style="color: #000040;">&amp;</span>d, <span style="color: #0000dd;">sizeof</span><span style="color: #008000;">&#40;</span>d<span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    sqlite3_bind_blob<span style="color: #008000;">&#40;</span>stmt, <span style="color: #0000dd;">2</span>, dblBuffer, <span style="color: #0000dd;">24</span>, SQLITE_STATIC<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    sqlite3_bind_int<span style="color: #008000;">&#40;</span>stmt, <span style="color: #0000dd;">3</span>, getInt<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    sqlite3_bind_int<span style="color: #008000;">&#40;</span>stmt, <span style="color: #0000dd;">4</span>, getInt<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    sqlite3_bind_int<span style="color: #008000;">&#40;</span>stmt, <span style="color: #0000dd;">5</span>, getInt<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
    <span style="color: #0000ff;">int</span> retVal <span style="color: #000080;">=</span> sqlite3_step<span style="color: #008000;">&#40;</span>stmt<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    <span style="color: #0000ff;">if</span> <span style="color: #008000;">&#40;</span>retVal <span style="color: #000040;">!</span><span style="color: #000080;">=</span> SQLITE_DONE<span style="color: #008000;">&#41;</span>
    <span style="color: #008000;">&#123;</span>
        <span style="color: #0000dd;">printf</span><span style="color: #008000;">&#40;</span><span style="color: #FF0000;">&quot;Commit Failed! %d<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span>, retVal<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    <span style="color: #008000;">&#125;</span>
&nbsp;
    sqlite3_reset<span style="color: #008000;">&#40;</span>stmt<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span>
&nbsp;
sqlite3_exec<span style="color: #008000;">&#40;</span>mDb, <span style="color: #FF0000;">&quot;COMMIT TRANSACTION&quot;</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #0000ff;">NULL</span>, <span style="color: #000040;">&amp;</span>errorMessage<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
sqlite3_finalize<span style="color: #008000;">&#40;</span>stmt<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span></pre></div></div>

<p>Note: I just used memcpy here, but this would have issues going between big and little endian systems.  If that’s necessary, it would be a good idea to serialize the data using a serialization library (ie &#8211; <a title="protocol buffers" href="http://code.google.com/apis/protocolbuffers/docs/overview.html">protocol buffers</a> or <a title="MsgPack" href="http://msgpack.org/">msgpack</a>).</p>
<h2>Performance</h2>
<p>I ran benchmarks to test the performance of each method of inserting data.  Take note that the x axis does not scale linearly, it most closely matches a logarithmic scale.  The inserts per second graph was obtained by taking the number of inserts and dividing it by the total runtime.</p>
<div id="attachment_238" class="wp-caption aligncenter" style="width: 310px"><a href="http://blog.quibb.org/wp-content/uploads/2010/07/bulk_insert_runtime.png"><img class="size-medium wp-image-238 " title="SQLite Bulk Insert Runtime" src="http://blog.quibb.org/wp-content/uploads/2010/07/bulk_insert_runtime-300x182.png" alt="SQLite Bulk Insert Runtime" width="300" height="182" /></a><p class="wp-caption-text">SQLite Build Insert Runtime in Seconds</p></div>
<p style="text-align: center;">
<div id="attachment_239" class="wp-caption aligncenter" style="width: 310px"><a href="http://blog.quibb.org/wp-content/uploads/2010/07/inserts_per_second.png"><img class="size-medium wp-image-239" title="Inserts Per Second" src="http://blog.quibb.org/wp-content/uploads/2010/07/inserts_per_second-300x182.png" alt="Inserts Per Second" width="300" height="182" /></a><p class="wp-caption-text">SQLite Inserts Per Second</p></div>
<p style="text-align: center;"><a href="http://blog.quibb.org/wp-content/uploads/2010/07/inserts_per_second.png"></a></p>
<p style="text-align: left;">After running the first benchmark, I wanted to show how storing data in binary can make a difference.  I ran it again, but instead of storing only three doubles, I stored 24 doubles.  I assumed order mattered, so for the benchmark that is not stored in a binary blob, I made a separate table with ID and order columns.  This way both versions captured the same information.</p>
<p style="text-align: left;">
<div id="attachment_242" class="wp-caption aligncenter" style="width: 310px"><a href="http://blog.quibb.org/wp-content/uploads/2010/07/big_insert_runtime.png"><img class="size-medium wp-image-242" title="Big Insert Runtime" src="http://blog.quibb.org/wp-content/uploads/2010/07/big_insert_runtime-300x182.png" alt="Big Insert Runtime" width="300" height="182" /></a><p class="wp-caption-text">Big Insert Runtime in Seconds</p></div>
<div id="attachment_244" class="wp-caption aligncenter" style="width: 310px"><a href="http://blog.quibb.org/wp-content/uploads/2010/07/big_insert_per_second.png"><img class="size-medium wp-image-244" title="Big Inserts Per Second" src="http://blog.quibb.org/wp-content/uploads/2010/07/big_insert_per_second-300x181.png" alt="Big Inserts Per Second" width="300" height="181" /></a><p class="wp-caption-text">Big Inserts Per Second</p></div>
<p>Good luck with your database inserts.</p>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow: hidden;">
<h1 id="internal-source-marker_0.4793873936321398"><span style="font-size: 24pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: bold; font-style: normal; text-decoration: none; vertical-align: baseline;">Fast Bulk Inserts into SQLite</span></h1>
<h2><span style="font-size: 18pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: bold; font-style: normal; text-decoration: none; vertical-align: baseline;">Background</span></h2>
<p><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none; vertical-align: baseline;">Sometimes it’s necessary to get information into a database quickly.  SQLite[</span><a href="http://sqlite.org/"><span style="font-size: 11pt; font-family: Arial; color: #000099; background-color: transparent; font-weight: normal; font-style: normal; vertical-align: baseline; text-decoration: underline;">http://sqlite.org/</span></a><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none; vertical-align: baseline;">]  is a light weight database engine that can be easily embedded in  applications.  This will cover the process of optimizing bulk inserts  into an SQLite database.  While this article focuses on SQLite some of  the techniques shown here will apply to other databases.</span></p>
<h2><span style="font-size: 18pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: bold; font-style: normal; text-decoration: none; vertical-align: baseline;">Naive Inserts</span></h2>
<p><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none; vertical-align: baseline;">This is the most basic way to insert information into SQLite.  It simply calls sqlite3_exec[</span><a href="http://www.sqlite.org/c3ref/exec.html"><span style="font-size: 11pt; font-family: Arial; color: #000099; background-color: transparent; font-weight: normal; font-style: normal; vertical-align: baseline; text-decoration: underline;">http://www.sqlite.org/c3ref/exec.html</span></a><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none; vertical-align: baseline;">] for each insert in the database.</span><br />
<span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none; vertical-align: baseline;">[insert code here]</span></p>
<p><span style="font-size: 18pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: bold; font-style: normal; text-decoration: none; vertical-align: baseline;">Inserts within a Transaction</span><br />
<span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none; vertical-align: baseline;">A  transaction is a way to group SQL statements together.  If an error is  encountered the ON CONFLICT statement can be used to handle that to your  liking.  Nothing will be written to the SQLite database until either  END or COMMIT is encountered to signify the transaction should be  written and closed.</span><br />
<span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none; vertical-align: baseline;">[insert code here]</span></p>
<h2><span style="font-size: 18pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: bold; font-style: normal; text-decoration: none; vertical-align: baseline;">PRAGMA Statements</span></h2>
<p><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none; vertical-align: baseline;">PRAGMA statements[</span><a href="http://sqlite.org/pragma.html"><span style="font-size: 11pt; font-family: Arial; color: #000099; background-color: transparent; font-weight: normal; font-style: normal; vertical-align: baseline; text-decoration: underline;">http://sqlite.org/pragma.html</span></a><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none; vertical-align: baseline;">]  control the behavior of SQLite as a whole.  They can be used to tweak  options such as how often the data is flushed to disk of the size of the  cache.</span><br />
<span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none; vertical-align: baseline;">[insert code here]</span></p>
<h2><span style="font-size: 18pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: bold; font-style: normal; text-decoration: none; vertical-align: baseline;">Prepared Statements</span></h2>
<p><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none; vertical-align: baseline;">Prepared statements[</span><a href="http://sqlite.org/c3ref/prepare.html"><span style="font-size: 11pt; font-family: Arial; color: #000099; background-color: transparent; font-weight: normal; font-style: normal; vertical-align: baseline; text-decoration: underline;">http://sqlite.org/c3ref/prepare.html</span></a><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none; vertical-align: baseline;">]  are the recommended way of sending queries to SQLite.  Rather than  parsing the statement over and over again, the parser only needs to be  run once on the statement.  In all honesty, the documentation for  sqlite3_exec should say not to use it at all in favor of prepared  statements.  They are not only faster on inserts, but across the board  for all SQL statements.</span><br />
<span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none; vertical-align: baseline;">[insert cod here]</span></p>
<h2><span style="font-size: 18pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: bold; font-style: normal; text-decoration: none; vertical-align: baseline;">Storing Data as Binary Blob</span></h2>
<p><span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none; vertical-align: baseline;">Up  until now, most of the optimizations have been pretty much the standard  advice that you get when looking into bulk insert optimization.  If  you’re not running queries on some of the data, it’s possible to convert  it to binary and store it as a blob.  While it’s not advised to just  throw everything into a blob and put it in the database, putting data  that would be pulled and used together into a binary blob can make sense  in some situations.</span><br />
<span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none; vertical-align: baseline;">For  example, if you have a point class (x, y, z) with REAL values, it might  make sense to store them in a blob rather than three separate fields in  row.  That’s only if you don’t need to make queries on the data though.   The benefits of this technique increase as more fields are converted  into larger blobs.</span><br />
<span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none; vertical-align: baseline;">[insert code here]</span><br />
<span style="font-size: 11pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none; vertical-align: baseline;">Note:  I just do a memcpy here, but this would have issues going between big  and little endian systems.  If that’s necessary, it would be a good idea  to serialize the data using a serialization library (ie &#8211; protocol  buffers[http://code.google.com/apis/protocolbuffers/docs/overview.html],  msgpack[http://msgpack.org/],  thrift[http://incubator.apache.org/thrift/]).</span></p>
<h2><span style="font-size: 18pt; font-family: Arial; color: #000000; background-color: transparent; font-weight: bold; font-style: normal; text-decoration: none; vertical-align: baseline;">Performance</span></h2>
<p>Fast Bulk Inserts into SQLite</p>
<p>Background</p>
<p>Sometimes it’s necessary to get information into a database quickly.  SQLite[http://sqlite.org/] is a light weight database engine that can be easily embedded in applications.  This will cover the process of optimizing bulk inserts into an SQLite database.  While this article focuses on SQLite some of the techniques shown here will apply to other databases.</p>
<p>Naive Inserts</p>
<p>This is the most basic way to insert information into SQLite.  It simply calls sqlite3_exec[http://www.sqlite.org/c3ref/exec.html] for each insert in the database.</p>
<p>[insert code here]</p>
<p>Inserts within a Transaction</p>
<p>A transaction is a way to group SQL statements together.  If an error is encountered the ON CONFLICT statement can be used to handle that to your liking.  Nothing will be written to the SQLite database until either END or COMMIT is encountered to signify the transaction should be written and closed.</p>
<p>[insert code here]</p>
<p>PRAGMA Statements</p>
<p>PRAGMA statements[http://sqlite.org/pragma.html] control the behavior of SQLite as a whole.  They can be used to tweak options such as how often the data is flushed to disk of the size of the cache.</p>
<p>[insert code here]</p>
<p>Prepared Statements</p>
<p>Prepared statements[http://sqlite.org/c3ref/prepare.html] are the recommended way of sending queries to SQLite.  Rather than parsing the statement over and over again, the parser only needs to be run once on the statement.  In all honesty, the documentation for sqlite3_exec should say not to use it at all in favor of prepared statements.  They are not only faster on inserts, but across the board for all SQL statements.</p>
<p>[insert cod here]</p>
<p>Storing Data as Binary Blob</p>
<p>Up until now, most of the optimizations have been pretty much the standard advice that you get when looking into bulk insert optimization.  If you’re not running queries on some of the data, it’s possible to convert it to binary and store it as a blob.  While it’s not advised to just throw everything into a blob and put it in the database, putting data that would be pulled and used together into a binary blob can make sense in some situations.</p>
<p>For example, if you have a point class (x, y, z) with REAL values, it might make sense to store them in a blob rather than three separate fields in row.  That’s only if you don’t need to make queries on the data though.  The benefits of this technique increase as more fields are converted into larger blobs.</p>
<p>[insert code here]</p>
<p>Note: I just do a memcpy here, but this would have issues going between big and little endian systems.  If that’s necessary, it would be a good idea to serialize the data using a serialization library (ie &#8211; protocol buffers[http://code.google.com/apis/protocolbuffers/docs/overview.html], msgpack[http://msgpack.org/], thrift[http://incubator.apache.org/thrift/]).</p>
<p>Performance</p>
</div>
<p><span id="more-219"></span></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.quibb.org/2010/08/fast-bulk-inserts-into-sqlite/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Sort Optimization</title>
		<link>http://blog.quibb.org/2008/11/sort-optimization/</link>
		<comments>http://blog.quibb.org/2008/11/sort-optimization/#comments</comments>
		<pubDate>Sat, 15 Nov 2008 18:31:19 +0000</pubDate>
		<dc:creator>Joe</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[benchmarks]]></category>
		<category><![CDATA[jruby]]></category>
		<category><![CDATA[optimization]]></category>
		<category><![CDATA[sorting]]></category>

		<guid isPermaLink="false">http://blog.quibb.org/?p=5</guid>
		<description><![CDATA[This all started one night when I was in the #JRuby channel on irc.freenode.net.  A channel user was complaining about JRuby&#8217;s sorting algorithm being slow.  I thought to myself, I should be able to speed it up.  At the time I was thinking since sorting has been so well researched it&#8217;d likely be easy to [...]]]></description>
			<content:encoded><![CDATA[<p style="text-align: left;">This all started one night when I was in the #JRuby channel on irc.freenode.net.  A channel user was complaining about JRuby&#8217;s sorting algorithm being slow.  I thought to myself, I should be able to speed it up.  At the time I was thinking since sorting has been so well researched it&#8217;d likely be easy to find documentation about it.  That didn&#8217;t turn out to be entirely the case.</p>
<p style="text-align: left;">I started by looking around on wikipedia, to see what that had to offer.  <a href="http://en.wikipedia.org/wiki/Introsort">IntroSort</a> caught my eye (also see: <a href="http://ralphunden.net/content/tutorials/a-guide-to-introsort/">A guide to Introsort</a>).  I thought it was interesting that after recursing to a certain depth it would switch from quicksort to heapsort.  I don&#8217;t think this optimization turned out to be that needed in the end though.  Other than switching to heapsort, it was pretty much a median of three quicksort with an insertion sort added.</p>
<p style="text-align: left;">It was a good starting point though.  I switched the heapsort for a shell sort and that dropped the number of comparisons needed by a good amount.  One thing I saw was only my &#8220;Median of 3 Killer&#8221; test case was affected by that.  I searched around the Internet often during this, and came across this page: <a href="http://arunchaganty.wordpress.com/2008/07/06/quicksort/">QuickSort</a>.  It had some interesting ideas like grouping the same element together when running the partition function.  I tried implementing that several times and it always ended up increasing the number of comparisons and the runtime.  I&#8217;m not 100% sure why.  If someone who knows more about sorting that I do has any input on that, I&#8217;d be happy to hear it.</p>
<p style="text-align: left;">Anyway, after working on that for a while I took a look at the competition.  I looked at Ruby&#8217;s C code.  They do some interesting things, which I hadn&#8217;t thought of up to that point.  First, they take the median of 7 if it has more than 200 elements.  Second, they don&#8217;t sample the end elements, this helps me out later in my optimizations.  I didn&#8217;t copy exactly what they were doing, but did a similar idea.  One thing they also did was look at the order of the 3 values that they compared last (I&#8217;ll refer to these as v1, v2, and v3).  v1 is before v2 and v3 in the list&#8217;s current state.  v2 is in the middle of the other two, and so on.</p>
<p style="text-align: left;">They had a lot more checks than what I ended up using, but I check if v1 &lt;= v2 &lt;= v3.  If this is the case I run the sequential test.  I also check if v1 &gt;= v2 &gt;= v3 to see if the list is in reversed order, and if it is I reverse the list before continuing.  While running these tests, I don&#8217;t check the first or last element of list because I have a separate test for that.  If it passes the sequential or reverse test that means the entire list is sorted except for potentially the first and/or the last element.  I then do a test on them and if they&#8217;re out of sequence I do bubble sort style swaps until they&#8217;re in the correct location.</p>
<p style="text-align: left;">Checking the end was one of the last optimizations I performed.  The main reason I added it is that case can be slow with the normal sorting algorithm, and I don&#8217;t think it&#8217;s that uncommon a case.  I&#8217;ve seen it happen where an element is appended to a sorted list and then the list is sorted again.  Overall, this catches a case with the potential to be slow in a fairly cheap manner.  These cases weren&#8217;t weeded out by the v1 &lt;= v2 &lt;= v3 style checks because the median of 7 that I use doesn&#8217;t check the end elements.</p>
<p style="text-align: left;">One of the last optimizations I performed was converting it to use a stack rather than operating recursively.  To be honest, this provided more speedup than I was expecting.  I guess all the function calls were taking a toll on the performance that I just didn&#8217;t realize at the time.</p>
<p style="text-align: left;">Another implementation technique that I ran tests to figure out which was better was whether to do insertion sort at the end on the entire list, or to do it as I find sections that are smaller than the threshold value.  After running benchmarks it turned out that putting it at the end was better.  I suspect that the extra function calls required for it to be done during the quicksort loop was more overhead than the potential gain of having more cache locality.</p>
<p style="text-align: left;">On with the benchmarks: (The numbers are time in seconds.  In parenthesis is speedup over Java.)</p>
<blockquote style="text-align: left;">
<pre>                          Java                 My Qsort
1245.repeat.1000.txt      1.6552300e-05 (1.00) 8.1874300e-06   (2.02 )
1245.repeat.10000.txt     0.00026284956 (1.00) 0.00012648017   (2.08 )
end.0.1000.txt            5.8904900e-06 (1.00) 1.7295600e-06   (3.41 )
end.0.10000.txt           8.3629030e-05 (1.00) 2.3148840e-05   (3.61 )
identical.1000.txt        3.5435500e-06 (1.00) 5.9168000e-07   (5.99 )
identical.10000.txt       3.7097050e-05 (1.00) 6.5003600e-06   (5.71 )
med.3.killer.1000.txt     1.0182050e-05 (1.00) 7.2132600e-06   (1.41 )
med.3.killer.10000.txt    0.00013968449 (1.00) 9.4941470e-05   (1.47 )
rand.dups.100.txt         1.2044400e-06 (1.00) 6.3194000e-07   (1.91 )
rand.dups.1000.txt        1.9847760e-05 (1.00) 1.0417630e-05   (1.91 )
rand.dups.10000.txt       0.00031415178 (1.00) 0.00020365385   (1.54 )
rand.no.dups.100.txt      1.2328200e-06 (1.00) 8.1612000e-07   (1.51 )
rand.no.dups.1000.txt     1.9309830e-05 (1.00) 1.1890280e-05   (1.62 )
rand.no.dups.10000.txt    0.00027436851 (1.00) 0.00017424722   (1.57 )
rand.steps.1000.txt       1.6057600e-05 (1.00) 1.0023700e-05   (1.60 )
rand.steps.10000.txt      0.00019955369 (1.00) 0.00017004971   (1.17 )
rev.ends.1000.txt         1.2306600e-05 (1.00) 2.8749300e-06   (4.28 )
rev.ends.10000.txt        9.4499880e-05 (1.00) 2.7255840e-05   (3.47 )
rev.partial.1000.txt      1.7107210e-05 (1.00) 8.7564000e-06   (1.95 )
rev.partial.10000.txt     0.00024198949 (1.00) 0.00013045623   (1.85 )
rev.saw.1000.txt          1.6793840e-05 (1.00) 9.4294600e-06   (1.78 )
rev.saw.10000.txt         0.00025133096 (1.00) 0.00014296088   (1.76 )
reverse.1000.txt          1.4600270e-05 (1.00) 1.1565800e-06   (12.6 )
reverse.10000.txt         0.00020535965 (1.00) 1.3798890e-05   (14.9 )
seq.0.is.1000.1000.txt    6.5672500e-06 (1.00) 1.4837800e-06   (4.43 )
seq.0.is.1000.10000.txt   4.7335130e-05 (1.00) 7.2842900e-06   (6.50 )
seq.partial.1000.txt      1.5497830e-05 (1.00) 6.0515300e-06   (2.56 )
seq.partial.10000.txt     0.00022936435 (1.00) 9.1791120e-05   (2.50 )
seq.saw.1000.txt          1.1645670e-05 (1.00) 4.6621150e-05   (0.250)
seq.saw.10000.txt         0.00019771144 (1.00) 0.00011184647   (1.77 )
sequential.1000.txt       3.4216700e-06 (1.00) 5.8689000e-07   (5.83 )
sequential.10000.txt      3.7812430e-05 (1.00) 6.1648500e-06   (6.13 )</pre>
</blockquote>
<p>These benchmarks were taken by running the sorting algorithm on each dataset 10000 times (to warm up the JVM), and then running it on the data 10 times to time it.  I took the average of those 10 runs.  The only case where My Qsort was slower than the built-in Java Arrays.sort() was seq.saw.1000.txt.  I attribute this to noise.  I ran it again and got the following:</p>
<blockquote>
<pre>seq.saw.1000.txt          0.00013980571 (1.00) 0.00012667049   (1.10 )</pre>
</blockquote>
<p>Hopefully this makes JRuby&#8217;s sorting comparable to Ruby&#8217;s.  One note, while Java&#8217;s sort is stable, the quicksort I wrote is not.  All that means is that if multiple entries have the same value they may get sorted differently.</p>
<p>You can do whatever you&#8217;d like with the source code.  If you do find it useful some credit and/or a link to here would be nice.</p>
<p>Here are the test files: <a href="http://blog.quibb.org/wp-content/uploads/2008/11/test_files.zip">Test Files<br />
</a></p>
<p>Here is the sort itself: <a href="http://blog.quibb.org/wp-content/uploads/2008/11/sort.zip">Sorting Algorithm </a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.quibb.org/2008/11/sort-optimization/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Ruby Shootout: Fasta</title>
		<link>http://blog.quibb.org/2008/11/ruby-shootout-fasta/</link>
		<comments>http://blog.quibb.org/2008/11/ruby-shootout-fasta/#comments</comments>
		<pubDate>Sun, 02 Nov 2008 13:39:34 +0000</pubDate>
		<dc:creator>Joe</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[benchmarks]]></category>
		<category><![CDATA[optimization]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[shootout]]></category>

		<guid isPermaLink="false">http://blog.quibb.org/?p=3</guid>
		<description><![CDATA[Lately, I&#8217;ve been looking at the shootout Ruby benchmarks. I&#8217;d gotten into a habit of checking them every few months, rooting for my favorite languages. Not really understanding why some didn&#8217;t have the greatest showing on there. When the people running it upgraded their hardware, it seems as though Ruby fell off the list. While [...]]]></description>
			<content:encoded><![CDATA[<p>Lately, I&#8217;ve been looking at the <a href="http://shootout.alioth.debian.org/">shootout</a> Ruby benchmarks.  I&#8217;d gotten into a habit of checking them every few months, rooting for my favorite languages.  Not really understanding why some didn&#8217;t have the greatest showing on there.  When the people running it upgraded their hardware, it seems as though Ruby fell off the list.  While Ruby is a slow language, it deserves its spot (even if it is the bottom).  I looked at the submissions and saw a post saying if more Ruby benchmarks were updated it would be included again.</p>
<p>I&#8217;ve since updated 3 of the benchmarks; I sped up reverse-compliment to be 2x faster than the previous.  I&#8217;m not a Ruby expert by any means, but I was able to get a speedup.  I don&#8217;t know if the other languages are in the same state, but they may be&#8230;</p>
<p>Anyway, the Fasta benchmark is the one that I most recently updated.  It took me a while to get a decent speedup.  It turned out the Array#find is slower than Array#each with a break statement.  Making this switch is what got the biggest speedup out of the program.</p>
<p>I wrote a benchmark showing the difference in runtime:</p>

<div class="wp_syntax"><div class="code"><pre class="ruby" style="font-family:monospace;"><span style="color:#CC0066; font-weight:bold;">require</span> <span style="color:#996600;">'benchmark'</span>
&nbsp;
N = <span style="color:#006666;">300</span>
table = <span style="color:#006600; font-weight:bold;">&#40;</span>0..300<span style="color:#006600; font-weight:bold;">&#41;</span>.<span style="color:#9900CC;">to_a</span>
<span style="color:#CC00FF; font-weight:bold;">Benchmark</span>.<span style="color:#9900CC;">bmbm</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#006666;">8</span><span style="color:#006600; font-weight:bold;">&#41;</span> <span style="color:#9966CC; font-weight:bold;">do</span> <span style="color:#006600; font-weight:bold;">|</span>x<span style="color:#006600; font-weight:bold;">|</span>
  x.<span style="color:#9900CC;">report</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#996600;">&quot;Array.find&quot;</span><span style="color:#006600; font-weight:bold;">&#41;</span> <span style="color:#006600; font-weight:bold;">&#123;</span>
    N.<span style="color:#9900CC;">times</span> <span style="color:#9966CC; font-weight:bold;">do</span>
      table.<span style="color:#9900CC;">each</span> <span style="color:#9966CC; font-weight:bold;">do</span> <span style="color:#006600; font-weight:bold;">|</span>find_num<span style="color:#006600; font-weight:bold;">|</span>
	table.<span style="color:#9900CC;">find</span> <span style="color:#9966CC; font-weight:bold;">do</span> <span style="color:#006600; font-weight:bold;">|</span>num<span style="color:#006600; font-weight:bold;">|</span>
	  num <span style="color:#006600; font-weight:bold;">&gt;</span> find_num
	<span style="color:#9966CC; font-weight:bold;">end</span>
      <span style="color:#9966CC; font-weight:bold;">end</span>
    <span style="color:#9966CC; font-weight:bold;">end</span>
  <span style="color:#006600; font-weight:bold;">&#125;</span>
&nbsp;
  x.<span style="color:#9900CC;">report</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#996600;">&quot;Array.each&quot;</span><span style="color:#006600; font-weight:bold;">&#41;</span> <span style="color:#006600; font-weight:bold;">&#123;</span>
    N.<span style="color:#9900CC;">times</span> <span style="color:#9966CC; font-weight:bold;">do</span>
      table.<span style="color:#9900CC;">each</span> <span style="color:#9966CC; font-weight:bold;">do</span> <span style="color:#006600; font-weight:bold;">|</span>find_num<span style="color:#006600; font-weight:bold;">|</span>
	output = <span style="color:#0000FF; font-weight:bold;">nil</span>
	table.<span style="color:#9900CC;">each</span> <span style="color:#9966CC; font-weight:bold;">do</span> <span style="color:#006600; font-weight:bold;">|</span>num<span style="color:#006600; font-weight:bold;">|</span>
	  <span style="color:#9966CC; font-weight:bold;">if</span> num <span style="color:#006600; font-weight:bold;">&gt;</span> find_num <span style="color:#9966CC; font-weight:bold;">then</span>
	    output = num
	    <span style="color:#9966CC; font-weight:bold;">break</span>
	  <span style="color:#9966CC; font-weight:bold;">end</span>
	<span style="color:#9966CC; font-weight:bold;">end</span>
      <span style="color:#9966CC; font-weight:bold;">end</span>
    <span style="color:#9966CC; font-weight:bold;">end</span>
  <span style="color:#006600; font-weight:bold;">&#125;</span>
<span style="color:#9966CC; font-weight:bold;">end</span></pre></div></div>

<p>The results:</p>
<pre>Ruby 1.8.6       user     system      total        real
Array.find  12.120000   0.020000  12.140000 ( 14.647705)
Array.each   8.150000   0.030000   8.180000 (  9.734463)

JRuby 1.1.4      user     system      total        real
Array.find   6.738000   0.000000   6.738000 (  6.738407)
Array.each   5.365000   0.000000   5.365000 (  5.364320)</pre>
<p>Now, I&#8217;m running this on an admittedly dated machine.  Again, I&#8217;m nowhere near a ruby expert, so if someone sees a way to make the benchmark more fair let me know, and I&#8217;ll update it.  I can&#8217;t say anything about how this would run in YARV when that is released.</p>
<p>Here is a link to the Ruby Fasta benchmark on the Language Shootout:</p>
<p><a title="Ruby Shootout: Fasta" href="http://shootout.alioth.debian.org/u32/benchmark.php?test=fasta&amp;lang=ruby&amp;id=1">Ruby Shootout: Fasta</a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.quibb.org/2008/11/ruby-shootout-fasta/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>
