Doing a lots of db import/export/migration tasks on my Linux notebook on large DB tables, I’ve seen slow responsiveness from the system. After a small investigation I’ve seen that my OS is swapping out too much causing performance degradation and making my system unusable.
By default Linux kernel is configured for server environments. One important parameter that should be considered when using Linux as desktop is vm.swappines, by default it’s value is 60.
That’s fine for a server as it swaps out memory to disk aggressively to release it for active processes, but on desktops this value is to high, the recommended value is 10 to allow the system to be more responsive.
[root@silver ~]# tail -2 /etc/sysctl.conf
# desktop settings
vm.swappiness = 10
More details:
Ubuntu Swap Faq
Optimizing Desktop Performance, Part I
Optimizing Desktop Performance, Part II
Optimizing Desktop Performance, Part III
We often need to iterate over the database rows in our migrations. When dealing with millions of records, basic iteration techniques doesn’t work well because each loaded object is consuming system memory and it issues at least one database query per object to load.
# >> Book.count :all
# => 4000216
Solution 1
class BooksUpdateTitles < ActiveRecord::Migration
def self.up
Books.all.each do |book|
# ...
end
end
def self.down
end
end
The problem with this version is that it will load all 4000216 objects into memory, all memory will be consumed and it will start to use disk swap and it will take hours to complete.
We can optimize it a little bit by specifying select parameter in our query:
Solution 2
Books.find(:all, :select => 'id').each do |t|
book = Book.find t.id
# ...
end
Version 2 still loads all objects in memory but selects only id field.
We need to avoid loading all objects in the memory, we’ll iterate over collection and we’ll load only current object.
Solution 3
last_id = 0
while book = Books.find(:first, :conditions => ['id > ?', last_id])
# ...
last_id = book.id
end
Version 3 it’s OK, but it can be speed up by loading objects in batch not just one by one.
Solution 4
last_id = 0
while books = Book.find(:all, :conditions => ['id > ?', last_id], :limit => 100)
# ...
last_id = books.last.id
end
Examining the log:
...
Domain LOAD (0.000176) SELECT * FROM `books` WHERE (id > 0) LIMIT 100
...
We have loaded 100 objects with one query. Solution 4 seems to be the best solution to iterate over large data sets as it uses less memory with fewer SQL requests.
Update:
Mitchell proposed a better solution to use ActiveRecord’s find_in_batches method. DHH commited this feature on February 23, 2009 that permits iterating over large data sets in batches:
Read more:
WebOnRails
GitHub
Solution 5
Book.find_in_batches(:batch_size => 100) do |results|
# Do something with results
end