benchmarks – Blog::Quibb

Some SQLite 3.7 Benchmarks

Joe — Thu, 14 Oct 2010 13:00:01 +0000

Since I wrote the benchmarks for insertions in my last post, SQLite 3.7 has been released. I figured it’d be interesting to see if 3.7 changed the situation at all.

Prepared Statements

The specific versions compared here are 3.6.23.1 and 3.7.3. I ran the prepared statements benchmark as is without changing any source code. Both are using a rollback journal in this case.

Prepared Statements Runtime in Seconds

Prepared Statements Inserts Per Second

As you can see, the new version of SQLite definitely provides better performance. There is a speedup of about 3 seconds.

Journal Mode Comparison

One of the main features SQLite 3.7 added was Write Ahead Logging (WAL). The main advantage of write-ahead logging is it allows more concurrent access to the database than a rollback journal. These benchmarks don’t show the true potential of write-ahead logging. The benchmarks are single threaded, and they insert a large amount of data in one transaction. It’s listed as a disadvantage that large transactions can be slow with write-ahead logging, even having the potential to return an error. I wanted to evaluate write-ahead logging as a drop in replacement.

I ran the prepared statements benchmark with the default, memory, and wal settings. I also ran each setting with and without synchronous on. The synchronous setting controls how often sqlite waits for data to be physically written to the hard disk. The default setting is full, which is the safest because it waits for data to be written to the hard disk most frequently. This is compared to synchronous being off, which lets the operating system decide when information should be written to the hard disk. In this case if there is a software crash, it’s more likely the database could become corrupt.

Journal Mode Runtime Comparison

Journal Mode Inserts Per Second

Up until about 100,000 insertions all the journal modes and synchronous settings are about even. After 100,000, the insertion benchmarks with synchronous off are faster than their default synchronous counterparts. Journal mode set to memory and synchronous off offered the best performance for this benchmark.

Fast Bulk Inserts into SQLite

Joe — Tue, 10 Aug 2010 14:11:56 +0000

Background

Sometimes it’s necessary to get information into a database quickly. SQLite is a light weight database engine that can be easily embedded in applications. This will cover the process of optimizing bulk inserts into an SQLite database. While this article focuses on SQLite some of the techniques shown here will apply to other databases.

All of the following examples insert data into the same table. It’s a table where an ID is the first element followed by three FLOAT values, and then follow by three INTEGER values. You’ll notice the getDouble() and getInt() functions. They return doubles and ints in a predictable manner. I didn’t use random data because different values could potentially add variability to the benchmarks at the end.

Naive Inserts

This is the most basic way to insert information into SQLite. It simply calls sqlite3_exec for each insert in the database.

char buffer[300];
for (unsigned i = 0; i < mVal; i++)
{
    sprintf(buffer, "INSERT INTO example VALUES ('%s', %lf, %lf, %lf, %d, %d, %d)",
            getID().c_str(), getDouble(), getDouble(), getDouble(),
            getInt(), getInt(), getInt());
    sqlite3_exec(mDb, buffer, NULL, NULL, NULL);
}

Inserts within a Transaction

A transaction is a way to group SQL statements together. If an error is encountered the ON CONFLICT statement can be used to handle that to your liking. Nothing will be written to the SQLite database until either END or COMMIT is encountered to signify the transaction should be closed and written.

char* errorMessage;
sqlite3_exec(mDb, "BEGIN TRANSACTION", NULL, NULL, &errorMessage);

char buffer[300];
for (unsigned i = 0; i < mVal; i++)
{
    sprintf(buffer, "INSERT INTO example VALUES ('%s', %lf, %lf, %lf, %d, %d, %d)",
            getID().c_str(), getDouble(), getDouble(), getDouble(),
            getInt(), getInt(), getInt());
    sqlite3_exec(mDb, buffer, NULL, NULL, NULL);
}

sqlite3_exec(mDb, "COMMIT TRANSACTION", NULL, NULL, &errorMessage);

PRAGMA Statements

PRAGMA statements control the behavior of SQLite as a whole. They can be used to tweak options such as how often the data is flushed to disk of the size of the cache. These are some that are commonly used for performance. The SQLite documentation fully explains what they do and the implications of using them. For example, synchronous off will cause SQLite to not stop and wait for the data to get written to the hard drive. In the event of a crash or power failure, it is more likely the database could be corrupted.

sqlite3_exec(mDb, "PRAGMA synchronous=OFF", NULL, NULL, &errorMessage);
sqlite3_exec(mDb, "PRAGMA count_changes=OFF", NULL, NULL, &errorMessage);
sqlite3_exec(mDb, "PRAGMA journal_mode=MEMORY", NULL, NULL, &errorMessage);
sqlite3_exec(mDb, "PRAGMA temp_store=MEMORY", NULL, NULL, &errorMessage);

Prepared Statements

Prepared statements are the recommended way of sending queries to SQLite. Rather than parsing the statement over and over again, the parser only needs to be run once on the statement. According to the documentation, sqlite3_exec is a convenience function that calls sqlite3_prepare_v2(), sqlite3_step(), and then sqlite3_finalize(). In my opinion, the documentation should more explicitly say that prepared statements are the preferred query method. sqlite3_exec() should only be used for one time use queries.

char* errorMessage;
sqlite3_exec(mDb, "BEGIN TRANSACTION", NULL, NULL, &errorMessage);

char buffer[] = "INSERT INTO example VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7)";
sqlite3_stmt* stmt;
sqlite3_prepare_v2(mDb, buffer, strlen(buffer), &stmt, NULL);

for (unsigned i = 0; i < mVal; i++)
{
    std::string id = getID();
    sqlite3_bind_text(stmt, 1, id.c_str(), id.size(), SQLITE_STATIC);
    sqlite3_bind_double(stmt, 2, getDouble());
    sqlite3_bind_double(stmt, 3, getDouble());
    sqlite3_bind_double(stmt, 4, getDouble());
    sqlite3_bind_int(stmt, 5, getInt());
    sqlite3_bind_int(stmt, 6, getInt());
    sqlite3_bind_int(stmt, 7, getInt());

    if (sqlite3_step(stmt) != SQLITE_DONE)
    {
        printf("Commit Failed!\n");
    }

    sqlite3_reset(stmt);
}

sqlite3_exec(mDb, "COMMIT TRANSACTION", NULL, NULL, &errorMessage);
sqlite3_finalize(stmt);

Storing Data as Binary Blob

For example, if you have a point class (x, y, z) with REAL values, it might make sense to store them in a blob rather than three separate fields in row. That’s only if you don’t need to make queries on the data though. The benefit of this technique increases as more fields are converted into larger blobs.

char* errorMessage;
sqlite3_exec(mDb, "BEGIN TRANSACTION", NULL, NULL, &errorMessage);

char buffer[] = "INSERT INTO example VALUES (?1, ?2, ?3, ?4, ?5)";
sqlite3_stmt* stmt;
sqlite3_prepare_v2(mDb, buffer, strlen(buffer), &stmt, NULL);

for (unsigned i = 0; i < mVal; i++)
{
    std::string id = getID();
    sqlite3_bind_text(stmt, 1, id.c_str(), id.size(), SQLITE_STATIC);

    char dblBuffer[24];
    double d[] = {getDouble(), getDouble(), getDouble()};
    memcpy(dblBuffer, (char*)&d, sizeof(d));
    sqlite3_bind_blob(stmt, 2, dblBuffer, 24, SQLITE_STATIC);
    sqlite3_bind_int(stmt, 3, getInt());
    sqlite3_bind_int(stmt, 4, getInt());
    sqlite3_bind_int(stmt, 5, getInt());

    int retVal = sqlite3_step(stmt);
    if (retVal != SQLITE_DONE)
    {
        printf("Commit Failed! %d\n", retVal);
    }

    sqlite3_reset(stmt);
}

sqlite3_exec(mDb, "COMMIT TRANSACTION", NULL, NULL, &errorMessage);
sqlite3_finalize(stmt);

Note: I just used memcpy here, but this would have issues going between big and little endian systems. If that’s necessary, it would be a good idea to serialize the data using a serialization library (ie – protocol buffers or msgpack).

Performance

I ran benchmarks to test the performance of each method of inserting data. Take note that the x axis does not scale linearly, it most closely matches a logarithmic scale. The inserts per second graph was obtained by taking the number of inserts and dividing it by the total runtime.

SQLite Build Insert Runtime in Seconds

SQLite Inserts Per Second

After running the first benchmark, I wanted to show how storing data in binary can make a difference. I ran it again, but instead of storing only three doubles, I stored 24 doubles. I assumed order mattered, so for the benchmark that is not stored in a binary blob, I made a separate table with ID and order columns. This way both versions captured the same information.

Big Insert Runtime in Seconds

Big Inserts Per Second

Good luck with your database inserts.

Fast Bulk Inserts into SQLite

Background

Sometimes it’s necessary to get information into a database quickly. SQLite[http://sqlite.org/] is a light weight database engine that can be easily embedded in applications. This will cover the process of optimizing bulk inserts into an SQLite database. While this article focuses on SQLite some of the techniques shown here will apply to other databases.

Naive Inserts

This is the most basic way to insert information into SQLite. It simply calls sqlite3_exec[http://www.sqlite.org/c3ref/exec.html] for each insert in the database.
[insert code here]

Inserts within a Transaction
A transaction is a way to group SQL statements together. If an error is encountered the ON CONFLICT statement can be used to handle that to your liking. Nothing will be written to the SQLite database until either END or COMMIT is encountered to signify the transaction should be written and closed.
[insert code here]

PRAGMA Statements

PRAGMA statements[http://sqlite.org/pragma.html] control the behavior of SQLite as a whole. They can be used to tweak options such as how often the data is flushed to disk of the size of the cache.
[insert code here]

Prepared Statements

Prepared statements[http://sqlite.org/c3ref/prepare.html] are the recommended way of sending queries to SQLite. Rather than parsing the statement over and over again, the parser only needs to be run once on the statement. In all honesty, the documentation for sqlite3_exec should say not to use it at all in favor of prepared statements. They are not only faster on inserts, but across the board for all SQL statements.
[insert cod here]

Storing Data as Binary Blob

Up until now, most of the optimizations have been pretty much the standard advice that you get when looking into bulk insert optimization. If you’re not running queries on some of the data, it’s possible to convert it to binary and store it as a blob. While it’s not advised to just throw everything into a blob and put it in the database, putting data that would be pulled and used together into a binary blob can make sense in some situations.
For example, if you have a point class (x, y, z) with REAL values, it might make sense to store them in a blob rather than three separate fields in row. That’s only if you don’t need to make queries on the data though. The benefits of this technique increase as more fields are converted into larger blobs.
[insert code here]
Note: I just do a memcpy here, but this would have issues going between big and little endian systems. If that’s necessary, it would be a good idea to serialize the data using a serialization library (ie – protocol buffers[http://code.google.com/apis/protocolbuffers/docs/overview.html], msgpack[http://msgpack.org/], thrift[http://incubator.apache.org/thrift/]).

Performance

Fast Bulk Inserts into SQLite

Background

Naive Inserts

This is the most basic way to insert information into SQLite. It simply calls sqlite3_exec[http://www.sqlite.org/c3ref/exec.html] for each insert in the database.

[insert code here]

Inserts within a Transaction

[insert code here]

PRAGMA Statements

PRAGMA statements[http://sqlite.org/pragma.html] control the behavior of SQLite as a whole. They can be used to tweak options such as how often the data is flushed to disk of the size of the cache.

[insert code here]

Prepared Statements

[insert cod here]

Storing Data as Binary Blob

For example, if you have a point class (x, y, z) with REAL values, it might make sense to store them in a blob rather than three separate fields in row. That’s only if you don’t need to make queries on the data though. The benefits of this technique increase as more fields are converted into larger blobs.

[insert code here]

Note: I just do a memcpy here, but this would have issues going between big and little endian systems. If that’s necessary, it would be a good idea to serialize the data using a serialization library (ie – protocol buffers[http://code.google.com/apis/protocolbuffers/docs/overview.html], msgpack[http://msgpack.org/], thrift[http://incubator.apache.org/thrift/]).

Performance

Nightly Benchmarks: Tracking Results with Codespeed

Joe — Mon, 19 Jul 2010 14:16:54 +0000

Background

Codespeed is a project for tracking performance. I discovered it when the PyPy project started using Codespeed to track performance. Since then development has been done to make its setup easier and provide more display options.

Anyway, two posts ago I talked about running nightly benchmarks with Hudson. Then in the previous post I discussed passing parameters between builds in Hudson. Both of these posts are worth reading before trying to setup Hudson with Codespeed.

Codespeed Installation/Configuration

Django Quickstart

Codespeed is built on Python and Django. Some basic knowledge of Django is needed in order to get everything up and running. Don’t worry, it’s not that hard to learn the bit that is needed. manage.py is all you need to know about to setup and view Codespeed. There is information about deploying Django to a real web server, but I won’t be covering that here.

Here are the commands to get Django running:

syncdb

syncdb is used to initialize the database with the necessary tables. It will also setup an admin account. With the sqlite3 database selected, it will create the database file when this command is run.

The command is:

python manage.py syncdb

runserver

The next command is the runserver command. This runs the built-in django server. In the documentation they state you’re not supposed to use it in a production environment, so make sure to deploy to a production environment if you plan to host it on the Internet or high traffic network.

The command is:

python manage.py runserver 0.0.0.0:9000

By default the server will run on 127.0.0.1:8000. Setting the IP to 0.0.0.0 allows connections from any computer. This works well if you’re on a local area network and want to set it up on a VM over SSH, but still be able to access the web interface from your computer. The port is the port for the server to run on. To view Codespeed, point your browser at 127.0.0.1:9000 or the IP of the machine it’s on with the colon 9000.

Django has many settings that may or may not need to be tweaked for your environment. They can be set through the speedcenter/settings.py file.

Codespeed Setup/Settings

Now for setting up the actual Codespeed server. First check it out using git. The clone command is:

git clone http://github.com/tobami/codespeed.git

The settings file is speedcenter/codespeed/settings.py.

Most of the default values will work fine. They’re mostly for setting default values for various things in the interface.

One thing that does need to be configured is the environment. Start by running the syncdb command and then run the server using runserver. Now that the server is running, browse to the admin interface. If you ran the server on port 9000, point your browser at http://127.0.0.1:9000/admin. Login using the username and password you created during the syncdb call. A Codespeed environment must be created manually. The environment is the machine you’re running the benchmarks on. After logging in, click Add next to the Environment label. Fill in the various fields and remember the name of it. Save it when you’re done. The name will be used later when submitting benchmark data to Codespeed.

Submitting Benchmarks

This will pick up where my last tutorial left off. The benchmarks were running as a nightly job in Hudson. Sending benchmark data to Codespeed will take a bit of programming. I’m going to continue the example with JRuby, so the benchmarks and submission process are written in Ruby.

In order to submit benchmarks information must be transferred from the JRuby build job to the Ruby Benchmarks job. My last post discussed how to transfer parameters between jobs. Using the Parameterized Trigger Plugin and passing extra parameters using a properties file will allow you to get all the necessary parameters to the benchmarks job.

The required information for submitting a benchmark result to Codespeed includes:

commitid – The id of the commit, which could either be a git/mercurial hashcode or an svn revision number.
project – The name of the project to save.
executable – The name of the executable.
benchmark – The name of the benchmark.
environment – This is the name of the environment you created earlier. It must be the name of an existing environment.
result_value – The runtime of the benchmark. You can configure what units a benchmark has through the admin interface. Default is seconds.

This information can be included but is optional:

std_dev – The standard deviation of the results of the benchmarks.
min
max
branch – The branch corresponding to this benchmark in the SCM repository.
result_date – The timestamp of the commit in the form “%Y-%m-%d %H:%M”

The above information is passed to Codespeed through an encoded URL. Have the URL point to http://127.0.0.1:9000/results/add/ and encode the parameters for sending. For the JRuby benchmarks, the following parameters are sent from the JRuby job to the to the ruby benchmarks job.

COMMIT_ID=$(git rev-parse HEAD)
COMMIT_TIME=$(git log -1 --pretty=\"format:%ad\")
RUBY_PATH=$WORKSPACE/bin/jruby
REPO_URL=git://github.com/jruby/jruby.git

The other fields are derived from the benchmarks job itself.

Here is the source code for submission through Ruby:

output = {}
canonical_name = doc["name"].gsub '//', '/'
output['commitid'] = commitid
output['project'] = BASE_VM
output['branch'] = branch
output['executable'] = BASE_VM
output['benchmark'] = File.basename(canonical_name)
output['environment'] = environment
output['result_value'] = doc["mean"]
output['std_dev'] = doc["standard_deviation"]
output['result_date'] = commit_time

res = Net::HTTP.post_form(URI.parse("#{server}/result/add/"), output)
puts res.body

It’s a good idea to always print out the response as it will contain debug information. There is an example of how to submit benchmarks to Codespeed using Python in the Codespeed repository in the tools directory.

Viewing Results

After results are in the the Codespeed database, you can view the data through the web interface. Direct a browser at http://127.0.0.1:9000. The changes view shows the trend over the last revisions. The timeline view allows you to see a graph of recent revisions, and the newly added comparison view will compare different executables running the same benchmark.

Nightly Benchmarks: Setting up Hudson

Joe — Thu, 08 Apr 2010 13:36:27 +0000

For some projects, finding out about performance regressions is important. I’m going to write a two part series about setting up a nightly build machine and displaying the generated data. This part is going to cover installation of Hudson, and getting the benchmarks running nightly.

I decided to give Hudson a try because I had heard good things about it. Also after hearing coworkers complain about cruise control and cdash, I thought I’d try something new. Since Hudson has pretty extensive documentation, I’ll walk you through setting up the JRuby project to build with Hudson and getting benchmarks running on it.

Hudson Installation

On Ubuntu it’s as simple as:

sudo apt-get install hudson

While I didn’t install it on windows, the installation should require little more than installing Tomcat and then downloading the Hudson war file and put it in the web-apps directory.

After installation browsing to http://127.0.0.1:8080 should show the Hudson Dashboard.

Hudson Configuration

After Hudson installation is complete, it requires very little configuration before setting up your first project. One thing that may be necessary is going to the plugins page and making sure your version control system is covered. For setting up a continuous integration machine to build JRuby, the git plugin is necessary.

To install the Hudson Git Plugin, click Manage Hudson on the left hand side. Then click Manage Plugins from the list in the middle of the screen. Click the Available tab, and find the Hudson GIT plugin in the list. After it’s installed it will show up in the Installed tab.

After installing all the necessary plugins for your project go back to the Hudson Dashboard by clicking the Hudson logo, or the Back to Dashboard link.

Setting up a Project to Build

A good first step it to make sure the project will build on the given machine without being built through Hudson. There may be some dependencies that got overlooked, and this is a good way to make sure everything is setup to build your project.

Now, click on the New Job link on the left hand side. For the JRuby project, the Build a free-style software project is the type of project to setup. I imagine that is the correct type of project to setup for most projects.

Unless you plan on keeping all the builds produced on the server, the Discard Old Builds is a good option to check, and set how long you want the builds to remain on the server. Choose the source code management tool that you use for your project, which is Git for JRuby, and set the appropriate settings.

JRuby settings:

URL of Repository: git://github.com/jruby/jruby.git
Branch Specifier (blank for default): master
Repository browser (Auto)

There are several types of Build Triggers by default. More Build Triggers can be added through plugins, if you’re looking for another way to trigger a build. For a nightly build at midnight select the Build periodically option, and put @midnight in the field.

For the build step, if you’re building a Java project select Invoke Ant. Otherwise, Execute shell may be a good option for you. For JRuby, select Invoke Ant and set the target to jar to build it.

At this point you can click the Save button at the bottom of the page and click Build Now on the next page to build your project. It’s a good idea to make sure your project builds correctly before trying to add in nightly benchmarks. It’s easier to debug problems before you have too much going on. By clicking on the build from the active builds list the console output can be seen from the browser.

Running the Benchmarks

If your benchmarks are in the same repository, you’re mostly done. Add another build step, and set it up to run your benchmarks. While JRuby does have benchmarks in its repository, the benchmarks I plan on running are in a different repository. With this goal in mind, I created another Job in Hudson to checkout and run the benchmarks.

Its setup is very similar to that of JRuby, it checks out the source and runs the benchmarks. The main difference is that a parameter is passed to the project to tell it which Ruby VM to use. The Parameterized Trigger Plugin is necessary to pass a parameter from one project to another. The way it works is you set a parameter in the project receiving the parameter near the top of the page. In my case, I added a RUBY_PATH parameter. Then you setup the build job to send that parameter to the benchmarks job.

To do this, I went back to the JRuby job and turned on the Trigger parameterized build on other projects option. It should be the last option down at the bottom of the page. I set the JRuby job to trigger with the benchmarks job name, and in the predefined parameters field I put the following:

 RUBY_PATH=$WORKSPACE/bin/jruby

After this is in place, when a JRuby build finishes it will start a benchmarks run. Now that your benchmarks are up and running, the next part to this series will go over how to display the information in a way that makes it easy to spot regressions.

If you have any questions or if I went over something too quickly, post a comment and/or ask a question.

Sort Optimization (Part 2) with JDK 6 vs JDK 7

Joe — Wed, 23 Dec 2009 15:00:28 +0000

In part 1, I went over my first foray into the world of sorting algorithms. Since then, I’ve had some other ideas on how to improve my quicksort implementation. One idea that I had while originally working on the sorting algorithm, was to rework the partition function to take into account duplicate elements. I had a few different working implementations, but all of them came with severe performance penalty. I finally figured out a way to get performance close to the previous algorithm.

The partition function needs to perform the minimal number of swaps possible. So moving towards the center from both ends and only swapping when both are out of order is the best approach I’ve found so far. When grouping duplicate elements, they are swapped to the beginning of the partition area as they are found. Then at the end, a pass is run to move them to their correct location in the final list. Then instead of returning one number from the partition function, it returns two. It returns the minimum and maximum indices on the range that has the pivot value.

Another area that I was able to get some performance gain out of was getting rid of the shell sort form the first algorithm. While that was there to make sure the quicksort did not recurse too deeply, in practice the shell sort algorithm doesn’t run.

Results

Here are the results of JDK 6 MergeSort, Tim Sort, QSort, QSortv2, and Dual Pivot sort 2 benchmarked on the same set of files. Overall, the new version doesn’t outperform the old version, but I thought it was worth posting my findings. On most data sets with duplicates it does perform better. I ran these benchmarks on OpenJDK 7 because I was curious as to how they would compare to one another.

It’s important to note that the tables are speedup relative the Java implementation on the given JDK. The graphs are the average runtimes for each algorithm. The reason for doing the average runtime is that it could show the performance difference between Sun’s JDK 6 and OpenJDK 7 build 73.

Sun JDK 6 without Warmup

OpenJDK 7 without Warmup

Sun JDK 6 1000 Warmup Iterations

OpenJDK 7 1000 Warmup Iterations

Sun JDK 6 vs OpenJDK 7 without Warmup

Sun JDK 6 vs OpenJDK 7 1000 Iterations of Warmup

Conclusions

Overall the new version of the Qsort implementation doesn’t improve greatly over the previous implementation. While it didn’t work out to be the performace improvement I was looking for. I think the last graph with 1000 iterations of warmup for each algorithm is the most interesting. The Qsort v2 implementation apparently doesn’t get handled any better by OpenJDK 7. The partition function is larger after my changes, so perhaps it didn’t JIT very well. What is interesting is the boost that Tim Sort saw with the change of JDK’s. Running these benchmarks made me realize that upgrading my Java Runtime will increase the performance of all my Java applications. It will be interesting to see if the performance carries over to Netbeans and Eclipse; I expect it will.