Number of RSS Readers

A piece of information that I’ve been analyzing, in my spare time, is the number of readers on this web log. How this is done can be very tricky, as there are a number of factors (people can click your RSS feed and ‘view’ it in their browser, but it doesn’t mean that they’re reading it on a regular basis). Regardless, the easiest way to figure out, approximately, how many readers you have is to count the numbers provided by news aggregators in their user agent string. Some information on common user agent formats can be found in an excellent write up on InsideGoogle.

I’ve also pulled together some code, from a Perl application that I’m writing in my spare time, if you’re interested in tracking something like this yourself.

my %rss = (
  "Blog" => ["/index.rdf","/?p=rss","/blog/index.rdf"],
  "Links" => ["/links/index.rdf"],
  "Projects" => ["/projects/index.rdf"]
);

my @rss_names = qw( users subscribers readers );
my %rss_count = ();
my %rss_ip = ();

sub rss {
  my ( $page, $user, $ip ) = @_;
  my $found = 1;

  foreach my $i ( keys %rss ) {
    foreach ( @{ $rss{ $i } } ) {
      if ( $page eq $_ ) {
        unless ( exists $rss_ip{ $i }{ $ip } ) {
          my $count = 1;
          foreach ( @rss_names ) {
            if ( $user =~ /(\d+) $_/i ) {
              $count = $1;
            } elsif ( $user =~ /$_ (\d+)/i ) {
              $count = $1;
            }
          }
          $rss_count{$i} += $count;
          $rss_ip{ $i }{ $ip } = 1;
        }
        return 0;
      }
    }
  }

  return 1;
}

In a nutshell, this is what the code is doing: Each RSS feed is analyzed, of which each feed can have multiple URLs. The URLs for the RSS feeds are specified in the first declaration:

my %rss = (
  "Blog" => ["/index.rdf","/?p=rss","/blog/index.rdf"],
  "Links" => ["/links/index.rdf"],
  "Projects" => ["/projects/index.rdf"]
);

(This pieces of code is what I use on my weblog.) I especially like the multiple URLs to RSS feed due to mis-behaving news aggregators not following updated permanent redirects. This way I can make sure that everyone reading the same content is pulled together.

The next aspect of RSS tracking lies in figuring out if the IP of the RSS user is unique, or not. Currently, this is the only way to track users who don’t use some form of a public aggregator and only pull information using some form of a desktop news application.

The main subroutine, itself, accepts three arguments. $page takes the URI of the requested page (e.g. /index.html). $user takes the user’s user agent string. $ip takes the user’s IP. The best way to use this subroutine is by iterating over your web server access logs (whatever form they may be in), parsing out the three pieces of information described above, and feeding it into this method.

After you’re done parsing all the requested information from your logs, you now have a nice little hash of information, that will look something like this:

%rss_count = (
  "Blog" => 155,
  "Links" => 31,
  "Projects" => 45
);

Unfortunately, you end up having to take this figures with a grain of salt, considering that users sometimes request a feed, but end up not becoming a regular subscriber. You’ll probably notice that you’re subscription numbers fluctuate on a day-by-day basis, this is mostly due to the fact that different numbers of people read on different days of the week (weekends are very slow reader days).

So, play around with this code, have some fun – I’m hoping to release a full stats app that I’ve developed (using the above code), here soon.

Posted: July 10th, 2005


Subscribe for email updates

Comments are closed.
Comments are automatically turned off two weeks after the original post. If you have a question concerning the content of this post, please feel free to contact me.


Secrets of the JavaScript Ninja

Secrets of the JS Ninja

Secret techniques of top JavaScript programmers. Published by Manning.

John Resig Twitter Updates

@jeresig

Infrequent, short, updates and links.