Vitalie Cherpec

Capturing web page screenshots from Ruby

A few days ago I’ve discovered that Thumbshots.org stopped to deliver web page screenshots for my DNS checking tool.

I don’t know when it happened, at some point back in time I’ve noticed that they are serving a placeholder image instead of screenshots. After a quick investigation I’ve found that they have a quota for the free usage tier now.

I’m missing the thumbnails, they make the reports to look more attractive with a better look and feel. I’ve checked for alternatives and I’ve found nothing to satisfy my needs. Considering today’s ridiculous prices of a VPS and bandwidth I decided that I would implement a simple service which will generate web page screenshots and store them on Amazon S3.

The first thing which came into my mind was PhantomJS, I could generate screenshots using PhantomJS. After analyzing Node.js gluing modules for PhantomJS I decided that I shouldn’t add another piece to manage in my stack. I settled to a Ruby solution, thanks to Capybara and Poltergeist it was trivial to implement:

 1require "capybara/dsl"
 2require "capybara/poltergeist"
 3
 4class Screenshot
 5  include Capybara::DSL
 6
 7  # Captures a screenshot of +url+ saving it to +path+.
 8  def capture(url, path)
 9    # Browser settings
10    page.driver.resize(1024, 768)
11    page.driver.headers = {
12      "User-Agent" => "Webshot 1.0",
13    }
14
15    # Open page
16    visit url
17
18    if page.driver.status_code == 200
19      # Save screenshot
20      page.driver.save_screenshot(path, :full => true)
21
22      # Resize image
23      # ...
24    else
25      # Handle error
26    end
27  end
28end
29
30# By default Capybara will try to boot a rack application
31# automatically. You might want to switch off Capybara's
32# rack server if you are running against a remote application
33Capybara.run_server = false
34Capybara.register_driver :poltergeist do |app|
35  Capybara::Poltergeist::Driver.new(app, {
36    # Raise JavaScript errors to Ruby
37    js_errors: false,
38    # Additional command line options for PhantomJS
39    phantomjs_options: ['--ignore-ssl-errors=yes'],
40  })
41end
42Capybara.current_driver = :poltergeist
43
44screenshot = Screenshot.new
45screenshot.capture "http://www.google.com/", "output.png"

After adding resizing capabilities with mini_magick, I’ve packed the example as a Ruby gem (webshot):

Installation

$ gem install webshot

Usage

1# Setup Capybara
2Webshot.capybara_setup!
3
4# Take a screenshot of Google's home page
5# and save it to a image file using png format
6webshot = Webshot::Screenshot.new
7webshot.capture "http://www.google.com/", "google.png"
Published: 2013-04-30

Hello, world!

A sample blockquote.

Nested blockquotes are also possible.

Headers work too

This is the outer quote again.

1class Hello
2  def say
3    puts "Hello, world!"
4  end
5end
6
7hello = Hello.new
8hello.say
Published: 2013-04-16