Enabling Fast Page Caching to Gather Anonymous Page Statistics

jtrudeau - May 14, 2009 - 19:15
Project:Cache Router
Version:6.x-1.x-dev
Component:Code
Category:feature request
Priority:normal
Assigned:Unassigned
Status:active
Description

I've made a small change to cacherouter.inc which enables the gathering of page view statistics when pages are served to anonymous users using fast page caching. This isn't normally possible since statistics requires a database connection, however I'm storing these views within memcached and then persisting the data in the database during cron execution.

This snippet gets inserted into cacherouter.inc around line 191, right before $page->data is printed and the function returns.

<?php
 
// increment page view counter for uri
 
if ($cached = cache_get('cached_page_view_counts')) {
   
$counts = $cached->data;
  }
 
$counts[request_uri()]++;
 
cache_set('cached_page_view_counts', $counts);
?>

This is the hook_cron() implementation which persists this data in the database. This assumes we're using pathauto or some other path-aliasing mechanism.

<?php
function cacheupdater_cron() {
 
// write out cached page view counts to database
 
if ($cached = cache_get('cached_page_view_counts')) {
   
// immediately clear the cache entry for page view counts
   
cache_clear_all('cached_page_view_counts');
   
$counts = $cached->data;
   
$limit = variable_get('cached_page_updates_per_cron', 1000);
   
$processed = 0;
   
// iterate over each page view count and add it to existing statistics
   
foreach ($counts as $uri => $views) {
     
$pieces = explode('/', drupal_lookup_path('source', substr($uri, 1)));
      if (
count($pieces) == 2 && $pieces[0] == 'node' && $nid = $pieces[1]) {
       
$timestamp = time();
        if (
$counter = db_fetch_object(db_query('SELECT daycount, totalcount FROM {node_counter} WHERE nid=%d', $nid))) {
         
db_query('UPDATE {node_counter} SET daycount=%d, totalcount=%d, timestamp=%d WHERE nid=%d', $counter->daycount + $views, $counter->totalcount + $views, $timestamp, $nid);
        }
        else {
         
db_query('INSERT INTO {node_counter} (nid, daycount, totalcount, timestamp) VALUES (%d, %d, %d, %d)', $nid, $views, $views, $timestamp);
        }
       
$processed++;
      }
      unset(
$counts[$uri]);
     
// reached the limit, stop processing
     
if ($limit > 0 && $processed === $limit) {
        break;
      }
    }
   
// merge unprocessed views into any views which have been made during this process
    // (this could *potentially* result in some lost page views, but amount should be trivial)
   
if (count($counts) > 0) {
      if (
$new_cached = cache_get('cached_page_view_counts')) {
        foreach (
$new_cached->data as $key => $value) {
         
$counts[$key] += $value;
        }
      }
     
cache_set('cached_page_view_counts', $counts);
    }
  }
}
?>

Steve, this could easily be encapsulated within cacherouter or a sub-module for all those using page fast-caching AND statistics. This got me thinking about the design of a new hook which could provide some dynamic data to pages without the need for a database connection. Let me know if anyone is interested in discussing.

#1

moshe weitzman - May 27, 2009 - 01:58

This is a terrific feature. I actually think core should adopt this pattern when memcache/apc are available. We really need to avoid writes to stats table on every request. Similarly, I would love to keep stats on cache hit/miss for the core cache.inc. until then, we swap cache.inc.

#2

yhager - May 27, 2009 - 03:03

The cron is not the only place you can lose counts. A number of web servers might increment the same counter together, and override each other's results. Depending on the load on the site, and the amount of web servers, you might lose more than a trivial amount.
As long as the count is not an atomic test-and-set operation, your counts might be awfully off.

 
 

Drupal is a registered trademark of Dries Buytaert.