Count internal and external links in a page

Today I’ve readed the Webmaster Guidelines, in the “Design and content guidelines” section Google recommends to keep a resonable number of links in the page:

Keep the links on a given page to a reasonable number.

Mat Cutts talks about 100 links/page.

Hmm, how many links do I have in my pages ? I’ve made a simple Ruby script to count links in a page (Nokogiri, Domainatrix and Open-Uri made it trivial):

#!/usr/bin/env ruby
 
require "rubygems"
require "nokogiri"
require "open-uri"
require 'domainatrix'
 
url = ARGV.first
 
raise "*** Use: links_count <url>" unless url
 
domain = Domainatrix.parse(url)
 
int_links = 0
ext_links = 0
 
doc = Nokogiri::HTML(open(url).read)
doc.xpath("//a[@href]").each do |node|
  link = node.get_attribute('href')
 
  if link =~ %r{\Ahttp://}
    l = Domainatrix.parse(link)
    if l.public_suffix == domain.public_suffix and l.domain == domain.domain
      int_links += 1
    else
      ext_links += 1
      puts link
    end
  else
    int_links += 1
  end
end
 
puts ""
puts "*** internal links: #{int_links}"
puts "*** external links: #{ext_links}"
puts "*** total: #{int_links + ext_links}"

Now I can see how many internal/external links contains a page. Example for CNN.com:

[vitalie@silver ~]$ links_count.rb http://www.cnn.com
http://www.cnnmexico.com/
http://www.ireport.com/
http://www.time.com/time/world/article/0,8599,1997325,00.html
http://www.ireport.com/docs/DOC-459640?hpt=Mid
http://www.ireport.com/docs/DOC-459640?hpt=Mid
http://www.ireport.com/?hpt=Sbin
http://twitter.com/worldcupcnn
http://foursquare.com/cnn
http://movie-critics.ew.com/2010/06/16/psycho-turns-50-today/
http://www.cnngo.com/hong-kong/play/beyond-star-ferry-457619
http://www.cnngo.com/bangkok/play/spirited-competition-koh-samui-regatta-706618
http://www.ireport.com/?cnn=yes
http://www.turnerstoreonline.com/
http://www.cnntraveller.com
http://www.cnnchile.com
http://www.cnnmexico.com
http://cnn.joins.com/
http://www.cnn.co.jp/
http://www.cnnturk.com/
http://www.turner.com/
http://www.cnnmediainfo.com/
http://www.turner.com/careers/
 
*** internal links: 263
*** external links: 22
*** total: 285

PageActions Plugin released

I’ve just released PageActions plugin on GitHub. It’s a really simple Rails plugin, but it helps you to easy define and render actions links in your views. You can view installation and usage instructions on GitHub:


http://github.com/vitalie/page_actions

Hpricot 0.8.1 on ruby 1.8.5

Installing latest Hpricot on ruby 1.8.5 fails due missing macro RARRAYPTR:

[root@silver ~]# gem install hpricot
Building native extensions.  This could take a while...
ERROR:  Error installing hpricot:
        ERROR: Failed to build gem native extension.
 
/usr/bin/ruby extconf.rb
checking for main() in -lc... yes
creating Makefile
 
make
gcc -I. -I. -I/usr/lib64/ruby/1.8/x86_64-linux -I.  -fPIC -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -Wall -fno-strict-aliasing  -fPIC  -c hpricot_css.c
hpricot_css.rl: In function ‘hpricot_css’:
hpricot_css.rl:106: warning: implicit declaration of function ‘RSTRING_PTR’
hpricot_css.rl:106: warning: assignment makes pointer from integer without a cast
hpricot_css.rl:107: warning: implicit declaration of function ‘RSTRING_LEN’
hpricot_css.rl:82: warning: field precision should have type ‘int’, but argument 5 has type ‘long int’
hpricot_css.c:295: warning: comparison is always true due to limited range of data type
[...]
hpricot_css.c:3403: warning: comparison between pointer and integer
hpricot_css.c:3403: warning: ‘eof’ is used uninitialized in this function
hpricot_css.rl:92: warning: ‘aps’ may be used uninitialized in this function
gcc -I. -I. -I/usr/lib64/ruby/1.8/x86_64-linux -I.  -fPIC -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -Wall -fno-strict-aliasing  -fPIC  -c hpricot_scan.c
hpricot_scan.rl: In function ‘our_rb_hash_lookup’:
hpricot_scan.rl:169: warning: implicit declaration of function ‘st_lookup’
hpricot_scan.rl: In function ‘make_hpricot_struct’:
hpricot_scan.rl:693: warning: implicit declaration of function ‘RARRAYPTR’
hpricot_scan.rl:693: error: subscripted value is neither array nor pointer
make: *** [hpricot_scan.o] Error 1
 
 
Gem files will remain installed in /usr/lib64/ruby/gems/1.8/gems/hpricot-0.8.1 for inspection.
Results logged to /usr/lib64/ruby/gems/1.8/gems/hpricot-0.8.1/ext/hpricot_scan/gem_make.out

Searching with google I’ve found this post that explains equivalence of the RARRAYPTR(v) is RARRAY(v)->ptr . We’ll need to define RARRAYPTR macro.

We’ll replace occurences of #include <ruby.h> with #include “ruby_macros.h” and create an include file ruby_macros.h with the following content:

#ifndef __RUBY_MACROS__
#define __RUBY_MACROS__
 
#include <ruby.h>
 
#ifndef RARRAYPTR
#  define RARRAYPTR(v) RARRAY(v)->ptr
#endif
#endif
[root@silver ~]# cd /usr/lib64/ruby/gems/1.8/gems/hpricot-0.8.1/ext/hpricot_scan 
[root@silver hpricot_scan]# sed -i 's,#include <ruby.h>,#include "ruby_macros.h",g' *.h *.c *.rl
[root@silver hpricot_scan]# touch ruby_macros.h
[root@silver hpricot_scan]# vi ruby_macros.h

The next step is to recreate the gem and install it:

[root@silver ~]# cd /usr/lib64/ruby/gems/1.8/gems/hpricot-0.8.1
[root@silver hpricot-0.8.1]# rake package
(in /usr/lib64/ruby/gems/1.8/gems/hpricot-0.8.1)
fatal: Not a git repository
rm -r ext/fast_xs/Makefile
rm -r ext/hpricot_scan/Makefile
rm -r .config
rm -r pkg
rm -r hpricot-0.8.1-mswin32
rm -r hpricot-0.8.1-jruby
Using ragel version: 6.3, location: /usr/bin/ragel
cd ext/hpricot_scan ; ragel hpricot_scan.rl -G2 -o hpricot_scan.c && ragel hpricot_css.rl -G2 -o hpricot_css.c
mkdir -p pkg
mkdir -p pkg/hpricot-0.8.1
rm -f pkg/hpricot-0.8.1/CHANGELOG
[...]
cd pkg
tar zcvf hpricot-0.8.1.tgz hpricot-0.8.1
hpricot-0.8.1/
hpricot-0.8.1/Rakefile
[...]
hpricot-0.8.1/ext/fast_xs/fast_xs.c
cd -
WARNING:  description and summary are identical
  Successfully built RubyGem
  Name: hpricot
  Version: 0.8.1
  File: hpricot-0.8.1.gem
mv hpricot-0.8.1.gem pkg/hpricot-0.8.1.gem
[root@silver hpricot-0.8.1]# gem install pkg/hpricot-0.8.1.gem
Building native extensions.  This could take a while...
Successfully installed hpricot-0.8.1
1 gem installed
Installing ri documentation for hpricot-0.8.1...
Installing RDoc documentation for hpricot-0.8.1...
 
[root@silver hpricot_scan]# gem list -l | grep hpricot
hpricot (0.8.1, 0.7, 0.6.164, 0.6.161, 0.6)

Missing host to link to! Please provide :host parameter or set default_url_options[:host]

Problem:
Missing host to link to! Please provide :host parameter or set default_url_options[:host] when sending emails.

Solution:
You can pass host parameter to url functions, but it’s cleaner to configure it with a before_filter globally in your application_controller.rb:

  # application_controller.rb
  before_filter :mailer_set_url_options
 
  ...
 
  def mailer_set_url_options
    ActionMailer::Base.default_url_options[:host] = request.host_with_port
  end

Simple script to convert ERB files to Haml

A simple script to convert .erb files from current directory to .haml :

#!/usr/bin/ruby
 
Dir.glob("*.html.erb").each do |erbname|
  hamlname = erbname.gsub(".html.erb", ".html.haml")
  system "/usr/bin/html2haml #{erbname} #{hamlname}"
end

This site may harm your computer

Today the website of one of the clients was blacklisted by Google by containing malicious software that downloads and installs without user’s consent. Google displayed “This site may harm your computer” under website in the results page.

Analyzing site’s sources we found obfuscated JavaScript code inserted near body, html tags in .html, .php, .tpl files and a .htaccess file with following content:

RewriteEngine On
RewriteCond %{HTTP_REFERER} .*google.*$ [NC,OR]
RewriteCond %{HTTP_REFERER} .*aol.*$ [NC,OR]
RewriteCond %{HTTP_REFERER} .*msn.*$ [NC,OR]
RewriteCond %{HTTP_REFERER} .*yahoo.*$ [NC,OR]
RewriteCond %{HTTP_REFERER} .*yandex.*$ [NC,OR]^M
RewriteCond %{HTTP_REFERER} .*rambler.*$ [NC,OR]^M
RewriteCond %{HTTP_REFERER} .*ya.*$ [NC]
RewriteRule .* http://real-antispyware.info/0/go.php?sid=2 [R,L]

Hmm, visitors from search engines were redirected to real-antispyware.info. This website is a scam that shows some JavaScript animation fulling the user with a message that his computer is infected and prompts him to download and install a fake AntiVirus.

Analyzing IP addresses from ftp logs we found connections from Russia and China that altered client’s website. Somehow they got user’s ftp password (it can be done in so many ways: weak password, traffic sniffing, virus, keylogger, trojan, …) and they altered website files.

You can use this simple Ruby script to analyze your ftp logs. By default it is configured for a Plesk server, and it will show suspicious lines (change IGNORE variables to fit your needs). You may need to install rubygems and geoip gem.

#!/usr/bin/ruby
 
require 'rubygems'
require 'geoip'
require 'zlib'
 
# hide logs from these countries
# Example: RO US
IGNORE_COUNTRIES = %w{RO US}
# free geoip database is not 100% accurate
# we may need to ignore a few ip addresses
IGNORE_IP = %w{127.0.0.1 127.0.0.2}
 
files = Dir.glob("/usr/local/psa/var/log/xferlog*")
geoip = GeoIP.new('/var/lib/GeoIP/GeoIP.dat')
 
def ip2country(geoip, ip)
  country = geoip.country(ip)[3]
end
 
ip_list = []
files.each do |filename|
  puts ""
  puts "Processing #{filename} ..."
 
  File.open(filename) do |f|
    input = f
    input = Zlib::GzipReader.new(f) if File.extname(filename) == ".gz"
 
    while line = input.gets do
      ip = line.split(/\s+/)[6]
 
      unless ip_list.include? ip
        country = ip2country(geoip, ip)
        unless IGNORE_COUNTRIES.include? country.upcase or IGNORE_IP.include? ip
          puts " [#{country} : #{ip}] => #{line}"
        end
        ip_list << ip
      end
   end
  end
end

Steps that needs to followed:

  1. Change FTP password
  2. Upload a clean copy from the backups of the website
  3. Submit the website in the Webmaster’s Tools for reconsideration
  4. Audit your company security: computers, firewalls, antiviruses, software, …

You may find useful diagnose tool from the Google (replace example.com with your domain):

http://www.google.com/safebrowsing/diagnostic?site=http://example.com

Howto install RMagick on CentOS 4

RMagick is an interface between the Ruby programming language and the ImageMagick® and GraphicsMagick image processing libraries.

To install RMagick on CentOS 4 you’ll need to install RMagick version 1 because version 2 requires newer version of ImageMagick that’s not available in CentOS 4 repositories.

I do assume that you already have installed RubyGems. If not, then read my post Install RubyGems on CentOS 4.

Let’s start by installing required libraries:

[root@lion ~]# yum  install gcc gcc-c++ ImageMagick-devel ghostscript freetype-devel \ 
                         libjpeg-devel libpng-devel libpng10-devel libwmf-devel libexif-devel libtiff-devel
[...]

Then install RMagick gem specifying version with ‘-v’ switch:

[root@lion ~]# gem install rmagick -v 1.15.14 
Building native extensions.  This could take a while...
Successfully installed rmagick-1.15.14
1 gem installed

Install RubyGems on CentOS 4

This post will explain how to install RubyGems 1.2 on the server running CentOS 4.

Following instructions from the install section of the RubyGems User Guide:

http://www.rubygems.org/read/chapter/3

[root@monster tmp]# yum -y install ruby ruby-devel irb
...
[root@monster tmp]# wget http://rubyforge.org/frs/download.php/38646/rubygems-1.2.0.tgz
...
[root@monster tmp]# tar xvfzp rubygems-1.2.0.tgz
...
[root@monster tmp]# cd rubygems-1.2.0
[root@monster rubygems-1.2.0]# ruby setup.rb 
Expected Ruby version > 1.8.3, was 1.8.1

Oops, we’ll need a newer version of ruby, by default CentOS 4 comes with ruby 1.8.1 . To install a newer version of ruby we’ll subscribe to testing repository from CentOS. To do this we’ll create a repo file called CentOS-Testing.repo in /etc/yum.repos.d directory:

CentOS-Testing.repo content

# /etc/yum.repos.d/CentOS-Testing.repo
# packages in testing repository
[testing]
name=CentOS-$releasever - Testing
baseurl=http://dev.centos.org/centos/$releasever/testing/$basearch/
gpgcheck=1
enabled=0
gpgkey=http://dev.centos.org/centos/RPM-GPG-KEY-CentOS-testing

Note the line with enabled=0, we’ll enable the repository only when needed. Install ruby from the testing repository:

[root@monster ~]# yum --enablerepo=testing install ruby ruby-devel ruby-libs ruby-irb ruby-rdoc 
Loading "fastestmirror" plugin
Setting up Install Process
Setting up repositories
Loading mirror speeds from cached hostfile
Reading repository metadata in from local files
Parsing package install arguments
Resolving Dependencies
...
Dependencies Resolved
 
=============================================================================
 Package                 Arch       Version          Repository        Size 
=============================================================================
Installing:
 ruby-irb                i386       1.8.5-5.el4.centos.1  testing            67 k
 ruby-rdoc               i386       1.8.5-5.el4.centos.1  testing           132 k
Updating:
 ruby                    i386       1.8.5-5.el4.centos.1  testing           272 k
 ruby-devel              i386       1.8.5-5.el4.centos.1  testing           503 k
 ruby-libs               i386       1.8.5-5.el4.centos.1  testing           1.5 M
 
Transaction Summary
=============================================================================
Install      2 Package(s)         
Update       3 Package(s)         
Remove       0 Package(s)         
Total download size: 2.5 M
Is this ok [y/N]: y
Downloading Packages:
(1/5): ruby-rdoc-1.8.5-5. 100% |=========================| 132 kB    00:02     
(2/5): ruby-libs-1.8.5-5. 100% |=========================| 1.5 MB    00:14     
(3/5): ruby-devel-1.8.5-5 100% |=========================| 503 kB    00:04     
(4/5): ruby-1.8.5-5.el4.c 100% |=========================| 272 kB    00:03     
(5/5): ruby-irb-1.8.5-5.e 100% |=========================|  67 kB    00:01     
Running Transaction Test
Finished Transaction Test
Transaction Test Succeeded
Running Transaction
  Updating  : ruby-libs                    ######################### [1/9] 
  Updating  : ruby                         ######################### [2/9] 
  Installing: ruby-irb                     ######################### [3/9] 
  Installing: ruby-rdoc                    ######################### [4/9] 
  Updating  : ruby-devel                   ######################### [5/9] 
  Cleanup   : ruby-libs                    ######################### [6/9]
  Cleanup   : ruby-devel                   ######################### [7/9]
  Cleanup   : ruby                         ######################### [8/9]
  Removing  : irb                          ######################### [9/9]
 
Installed: ruby-irb.i386 0:1.8.5-5.el4.centos.1 ruby-rdoc.i386 0:1.8.5-5.el4.centos.1
Updated: ruby.i386 0:1.8.5-5.el4.centos.1 ruby-devel.i386 0:1.8.5-5.el4.centos.1 ruby-libs.i386 0:1.8.5-5.el4.centos.1
Complete!

Now it’s better, we do have ruby 1.8.5 installed on the system. Let’s return to RubyGems install:

[root@monster rubygems-1.2.0]# ruby setup.rb
...
------------------------------------------------------------------------------
 
RubyGems installed the following executables:
        /usr/bin/gem
 
If `gem` was installed by a previous RubyGems installation, you may need
to remove it by hand.

Voila! We have installed RubyGems 1.2 on the CentOS 4 server.