Here is what I want from you: I want you to review the validity of my approach to A/B testing and to recommend your alternate approaches for different operational circumstances. With your feedback (and after an unspecified length of time), I am willing to draft documentation based on our collective wisdom for your review.

I have invested a few hours in understanding Drupal's multi-site capabilities and how they can be used to perform A/B testing of site themes. I looked through the available documentation but found a lack of helpful information on this topic. In other words, we have an opportunity to improve the documentation of multi-site setups and A/B testing procedures.

Although the topic of multi-site setups could undoubtedly benefit from independent improvement, it plays a key role in my approach to A/B testing. So I am including some multi-site tips here.

But first, let me explain what I mean by A/B testing. In A/B testing, we test the performance of A versus B. For example, we may wish to test the effect of a font change on ad click-through rates. So we setup two site configurations and sample (i.e. measure) visitor behavior under both. Then using statistical techniques (e.g. hypothesis testing or factorial design), we can estimate the true effect of B versus A.

The operational circumstances dictate which approaches to A/B testing are feasible. In my case, I am using a shared LAMP server. With a dedicated Apache process, I would have considered the following:

  • Apache's RewriteMap
  • A front-end proxy like Squid
  • Sub-domains. (I am too cheap to pay my host, who is otherwise excellent.)

But given my limitations, here is how I setup a multi-site Drupal 4.7.0-beta4 installation for A/B testing.

  1. Create a shadow installation of Drupal for configuration B.
    1. Create a directory in Drupal's root directory to hold the shadow installation.
      $ cd /path/to/drupal
      $ mkdir b
    2. Link all but index.php of the shadow installation to the real installation.
      $ cd b
      $ ln -s ../database database
      $ ln -s ../files files
      $ ln -s ../includes includes
      $ ln -s ../misc misc
      $ ln -s ../modules modules
      $ ln -s ../sites sites
      $ ln -s ../themes themes
      $ ln -s ../xmlrpc.php xmlrpc.php
    3. Copy index.php to the shadow installation. (We'll edit it later.)
      $ cp ../index.php .
      
  2. Setup the settings file for configuration B using Drupal's multi-site capabilities.
    1. Create a directory for configuration B's settings file. (Substitute your domain name, of course.)
      $ cd sites
      $ mkdir example.com.b
    2. Copy the default settings file into configuration B's directory.
      $ cp default/settings.php example.com.b
      
    3. Edit configuration B's settings file at your discretion. For example, you could specify an alternate Drupal theme.
  3. Edit the .htaccess files to ensure web visitors view pages under the correct configuration. There are two basic options for this step.
    • Option 1: Randomly assign configuration B to visitors of a certain page. Below is the extra mod_rewrite code for the .htaccess file in the default Drupal installation (configuration A). Place the codeimmediately after RewriteBase.
        RewriteCond %{REQUEST_METHOD} =GET
        RewriteCond %{REQUEST_URI}    ^/path-of-page-to-test$
        RewriteCond %{TIME_SEC}       >30
        RewriteRule ^(.*)$            /b/$1 [L,QSA]

      Adjust the number 30 to affect what proportion of visitors are randomly directed to configuration B.

    • Option 2: Require configuration B for some pages and configuration A for others. Again, the following code goes immediately after RewriteBase in the default .htaccess.
        RewriteCond %{REQUEST_METHOD} =GET
        RewriteCond %{REQUEST_URI}    ^/(regular-expression-describing-the-pages-for-B)$ [OR]
        RewriteCond %{REQUEST_URI}    ^/(another-regular-expression)$
        RewriteRule ^.*$                       /b/%1 [L]

    In either case, the .htaccess file in the shadow Drupal installation (configuration B) needs to match.

      RewriteCond %{REQUEST_METHOD}    =GET
      RewriteCond %{REQUEST_URI}       !^/b/(themes/|misc/|files/|index.php).*
      RewriteCond %{REQUEST_URI}       !^/b/regular-expression-describing-the-pages-for-B$
      RewriteCond %{REQUEST_URI}       !^/b/another-regular-expression$
      RewriteCond %{REQUEST_URI}       !^/b/path-of-page-to-test$
      RewriteRule ^(.*)$                        http://example.com/$1   [L,R]
  4. To make sure a visitor (like the Google-bot) does not get stuck in configuration B, we redefine the $base_url immediately before the page is themed. drupal_get_html_head() relies on this variable to set <base href="http://example.com" />, but we cannot redefine it earlier because the multi-site code apparently depends on it as well.

    Edit index.php in the shadow Drupal installation.

      default:
        if (!empty($return)) {
          global $base_url;
          $base_url = 'http://example.com';
    
          print theme('page', $return);
        }
        break;

This approach has some surprising advantages. First, visitors return to configuration A by following any relative URL on a configuration-B page. Secondly, the redirects are all internal to the web server. The address in the browser bar never changes, so there is no opportunity for broken bookmarks. And finally, because the browser-visible addresses are the same for both configurations A and B, context-sensitive ads should not be disrupted.

After using this approach, it is my intention to set use web analytics to measure the difference in performance between the configurations A and B. Then, with statistical hypothesis testing methods, I intend to estimate the true difference and determine if B is indeed better than A.

But this approach has not been tested. So, once again, here is what I want from you: I want you to review the validity of my approach to A/B testing and to recommend your improvements and alternatives under different operational circumstances. Then we can document our collective expertise for the benefit of each other and the Drupal community.

Nic Ivy

Comments

njivy’s picture

Actually, with the $base_url hack the modifications to .htaccess in the shadow Drupal installation (configuration B) are not necessary. So step 3 is simplified, and this should improve performance.

njivy’s picture

Paging caching can interfere with randomly A/B testing one page (see Option #1 mentioned previously). So to get around this, I edited bootstrap.inc and common.inc such that the cache key is $_SERVER['REDIRECT_URL'] concatenated with the normal cache key.

sepeck’s picture

Status: Active » Closed (fixed)

no activity for a year

gav240z’s picture

Component: Admin Guide » Correction/Clarification

Very interesting ideas. Here are some problems I forsee.

1. How do you ensure repeat visitors don't see the alternative combination each time they return. Eg: repeat visitors see's A version returns and see's B version?

- Google Website Optimizer sets a cookie on visitor machine to determine what variation the user should see.

2. From my understanding the visitors to the site will not see different URI's so how can you measure the impact through Web Analytics? I would imagine the split test should send some visitors to the site in version a eg: www.example.com and then some to version be www.example.com/b/

3. This might be overkill for someone who wants to just test 1 page. Eg: an a/b test for a Drupal user registration page.

Otherwise some very interesting techniques.