Mass inserting data in Rails without killing your performance – Coffeepowered
Sep 3, 2013Clipped on 2013-09-03 10:22:52 -0500
Chris Heald
Chief Architect for Mashable. Rubyist, husband, father, and all around tall guy.
Mass inserting data in Rails without killing your performance
Mass inserting is one of those operations that isn’t really well-supported by ActiveRecord, but which has to be done nonethless. You might say, “Well hey, I’ll just run a loop and create a bunch of AR objects, no sweat”.
That’ll work, but if speed is a factor, it might not be your best option.
ActiveRecord makes interface to the DB very easy, but it doesn’t necessarily make it fast. Instantiating an ActiveRecord object is costly, and if you do a lot of ‘em, that’s going to cause you to bump up against the garbage collector, which will significantly hinder performance. There are several options, though, depending on how much speed you need.
There are benchmarks at the bottom of the post, so if you’re just interested in those, scroll down.
Option 1: Use transactions
This is definitely the easiest method, and while you’ll realize gains from it, you aren’t going to be breaking any speed records using only this method. However, it’s well worth it if you are doing mass inserts via ActiveRecord.
Instead of
1000.times { Model.create(options) }
You want:
ActiveRecord::Base.transaction do 1000.times { Model.create(options) } end
The net effect is that the database performs all of your inserts in a single transaction, rather than starting and committing a new transaction for every request.
Options 2: Get down and dirty with the raw SQL
If you know that your data is valid and can afford to skip validations, you can save a lot of time by just jumping directly to raw SQL.
Imagine, for example, that you’re running the following:
1000.times {|i| Foo.create(:counter => i) }
That’s going to create 1000 ActiveRecord objects, run validations, generate the insert SQL, and dump it into the database. You can realize large performance gains by just jumping directly to the generated SQL.
1000.times do |i| Foo.connection.execute "INSERT INTO foos (counter) values (#{i})" end
You should use
sanitize_sql
and such as necessary to sanitize input values if they are not already sanitized, but with this technique you can realize extremely large performance gains. Of course, wrapping all those inserts in a single transaction, as in Option 1 gets you even more performance.Foo.transaction do 1000.times do |i| Foo.connection.execute "INSERT INTO foos (counter) values (#{i})" end end
Option 3: A single mass insert
Many databases support mass inserts of data in a single insert statement. They are able to significantly optimize this operation under the hood, and if you’re comfortable using it, will be your fastest option by far.
inserts = [] TIMES.times do inserts.push "(3.0, '2009-01-23 20:21:13', 2, 1)" end sql = "INSERT INTO user_node_scores (`score`, `updated_at`, `node_id`, `user_id`) VALUES #{inserts.join(", ")}"
No transaction block is necessary here, since it’s just a single statement, and the DB will wrap it in a transaction. We build an array of value sets to include, then just join them into the
INSERT
statement. We don’t use string concatenation, since it will result in significantly more string garbage generated, which could potentially get us into the GC, which we’re trying to avoid (and hey, memory savings are always good).Option 4: ActiveRecord::Extensions
njero in
#rubyonrails
pointed me at this nifty little gem and I decided to include it. It seems to try to intelligently do mass inserts of data. I wasn’t able to get it to emulate the single mass insert for a MySQL database, but it does provide a significant speed increase without much additional work, and can preserve your validations and such.There’s the obvious added benefit that you stay in pure Ruby, and don’t have to get into the raw SQL.
columns = [:score, :node_id, :user_id] values = [] TIMES.times do values.push [3, 2, 1] end UserNodeScore.import columns, values
Benchmarks
I used a simple script to test each of the methods described here.
require "ar-extensions" CONN = ActiveRecord::Base.connection TIMES = 10000 def do_inserts TIMES.times { UserNodeScore.create(:user_id => 1, :node_id => 2, :score => 3) } end def raw_sql TIMES.times { CONN.execute "INSERT INTO `user_node_scores` (`score`, `updated_at`, `node_id`, `user_id`) VALUES(3.0, '2009-01-23 20:21:13', 2, 1)" } end def mass_insert inserts = [] TIMES.times do inserts.push "(3.0, '2009-01-23 20:21:13', 2, 1)" end sql = "INSERT INTO user_node_scores (`score`, `updated_at`, `node_id`, `user_id`) VALUES #{inserts.join(", ")}" CONN.execute sql end def activerecord_extensions_mass_insert(validate = true) columns = [:score, :node_id, :user_id] values = [] TIMES.times do values.push [3, 2, 1] end UserNodeScore.import columns, values, {:validate => validate} end puts "Testing various insert methods for #{TIMES} inserts\n" puts "ActiveRecord without transaction:" puts base = Benchmark.measure { do_inserts } puts "ActiveRecord with transaction:" puts bench = Benchmark.measure { ActiveRecord::Base.transaction { do_inserts } } puts sprintf(" %2.2fx faster than base", base.real / bench.real) puts "Raw SQL without transaction:" puts bench = Benchmark.measure { raw_sql } puts sprintf(" %2.2fx faster than base", base.real / bench.real) puts "Raw SQL with transaction:" puts bench = Benchmark.measure { ActiveRecord::Base.transaction { raw_sql } } puts sprintf(" %2.2fx faster than base", base.real / bench.real) puts "Single mass insert:" puts bench = Benchmark.measure { mass_insert } puts sprintf(" %2.2fx faster than base", base.real / bench.real) puts "ActiveRecord::Extensions mass insert:" puts bench = Benchmark.measure { activerecord_extensions_mass_insert } puts sprintf(" %2.2fx faster than base", base.real / bench.real) puts "ActiveRecord::Extensions mass insert without validations:" puts bench = Benchmark.measure { activerecord_extensions_mass_insert(true) } puts sprintf(" %2.2fx faster than base", base.real / bench.real)
And the results:
Testing various insert methods for 10000 inserts ActiveRecord without transaction: 14.930000 0.640000 15.570000 ( 18.898352) ActiveRecord with transaction: 13.420000 0.310000 13.730000 ( 14.619136) 1.29x faster than base Raw SQL without transaction: 0.920000 0.170000 1.090000 ( 3.731032) 5.07x faster than base Raw SQL with transaction: 0.870000 0.150000 1.020000 ( 1.648834) 11.46x faster than base Single mass insert: 0.000000 0.000000 0.000000 ( 0.268634) 70.35x faster than base ActiveRecord::Extensions mass insert: 6.580000 0.280000 6.860000 ( 9.409169) 2.01x faster than base ActiveRecord::Extensions mass insert without validations: 6.550000 0.240000 6.790000 ( 9.446273) 2.00x faster than base
The results are fairly self-explainatory, but of particular note is the specific single INSERT statement. At 70x faster than the non-transactional ActiveRecord insert, if you need speed, it’s hard to beat.
Conclusions
ActiveRecord is great, but sometimes it’ll hold you back. Finding the balance between ease of use (full ActiveRecord) and performance (bare metal mass inserts) can have a profound effect on the performance of your app.