I recently had a requirement to insert a few hudred rows into a relational database ever couple of seconds. Generally this could be accomplished during the request, but I didn’t want to introduce issues if there were spikes. Therefore, I figured I would cache the data and then write it out in a background thread every few seconds. This would also increase my response time during requests.
I wasn’t sure whether or not JDBC batch or using a single insert statement that inserted multiple rows would be faster. Therefore, I setup a little test to see which was going to work better. First some code:
Here is the code for a single insert statement that inserts multiple rows.
long start = System.currentTimeMillis(); StringBuilder build = new StringBuilder("insert into insert_values (foo, bar, baz) values "); for (int i = 0; i < 1000; i++) { build.append("(?, ?, ?)"); if (i < 999) { build.append(", "); } else { build.append(";"); } } Connection c = DriverManager.getConnection("jdbc:mysql://localhost:3306/batch_vs_insert_test", "dev", "dev"); c.setAutoCommit(false); PreparedStatement ps = c.prepareStatement(build.toString()); for (int i = 0; i < 3000; i++) { ps.setString(i + 1, "value" + i); } int result = ps.executeUpdate(); c.commit(); long end = System.currentTimeMillis(); System.out.println("Insert time was " + (end - start)); ps.close(); c.close();
Here is the code for the JDBC batch:
long start = System.currentTimeMillis(); Connection c = DriverManager.getConnection("jdbc:mysql://localhost:3306/batch_vs_insert_test", "dev", "dev"); c.setAutoCommit(false); PreparedStatement ps = c.prepareStatement("insert into insert_values (foo, bar, baz) values (?, ?, ?);"); for (int i = 0; i < 3000; i++) { ps.setString((i % 3) + 1, "value" + i); if ((i + 1) % 3 == 0) { ps.addBatch(); } } int[] results = ps.executeBatch(); c.commit(); long end = System.currentTimeMillis(); System.out.println("Batch time was " + (end - start)); ps.close(); c.close();
The average time over 10 iterations where roughly as follows:
JDBC Batch: ~100 milliseconds
Single Insert: ~10 milliseconds
It looks like the single insert statement wins by a large margin.
UPDATE: This method yields very similar results on both MySQL and PostgreSQL using the latest drivers from each organization.
Hi Brian, with 3000 rows I can not see the performance. 3.000.000 rows is more appropriated no ?
LikeLike
Depending on your setup and tests, you should see a performance bump with as little as 3,000 rows. However, if you really want to see the difference, insert 3,000 rows using a single insert statement and a batch statement 100,000 times each. That should show you the differences nicely. This would require an outer loop over my code, but would definitely reveal which is faster.
LikeLike
Hi Brian, thanks for interesting post. Can you please update the post with which database and jdbc driver version you used during the test? Because recently I found out that there are quite big differences in jdbc drivers where it comes to performance… thanks
LikeLike
Thanks for the comment. I updated the post with this info. If you need more, comment again or email me.
LikeLike
I am working on writing data into MySQL, I used
String URL =”jdbc:mysql://localhost/database_name?rewriteBatchedStatements=true”
that is I added “?rewriteBatchedStatements=true” at the end, then the batch insert speeds up, and performs much faster than the individual insert.
LikeLike
I did a bit of research and that option for the driver appears to be doing the same thing as a bulk insert. It caches the inserts and rewrites them into a single statement. I would expect that this would be much faster than individual inserts, faster than batch inserts, but slower than a single bulk insert statement from Java code.
LikeLike
But this options is present in latest DB. Can we achieve this in DB 8.2 of postgres?
LikeLike
This worked like a charm. I got a speed up from 100 to 12 seconds!
LikeLike
Hi Brian, You have a beautiful programming mind 🙂 I am glad I came here and had an eyeful of your blog 🙂 Keep updating .
Thanks
LikeLike
Can you please provide example generic to handle all objects with multi row insert.
I am facing same issue for million rows batch processing is taking time. My DB version 8.2 so i cannot use reWriteBatchedInserts because its available in 9.4 version of postgres DB.
LikeLike
Hi Sachin,
I’m not sure what you mean by “handle all objects”. If you can post some sample code that is causing issues, I can probably help you tune it. We’ve done a lot of work around massive insert volumes.
LikeLike