The Economist is now using Drupal 6 to serve the vast majority of content pages to its flagship web site, economist.com. The homepage is Drupal powered, along with all articles, channels, comments, and more. The Economist evaluated several open source CMS and proprietary solutions aimed at media publishers. In the end, The Economist chose Drupal for its vibrant community, and the ecosystem of modules that it produces. The Economist will be adding lots of social tools to its site over time, and doing so on its existing platform was too slow/inefficient.

The Economist hired Cyrve to migrate its large and volatile dataset to Drupal. With the sponsorship and encouragement of The Economist, Cyrve open sourced its migrate module which is the heart of its migration methodology. The Economist and Cyrve hope this article helps more sites migrate to Drupal.

Before Drupal

  • 20-30 million page views per month. 3-4 millon unique visitors per month
  • Over 3 million registered users
  • Posting rate exceeds a comment per minute.
  • Powered by a custom Cold Fusion application and an Oracle database.

Get intimate with the source data

We usually start by reviewing an article web page and identifying where each piece of data is stored in the 'legacy' system. For the Economist the most interesting challenges were

  • The legacy schema attempted to impose an object-oriented design on a relational database. There was a central cms_object table, holding all kinds of content, with content-specific data two degrees of separation away (with a cms_relations table in the middle). This meant that joins were quite complex, even for conceptually simple cases.
  • The text content itself was embedded in an NITF object stored in the database, requiring run-time XML parsing to explode it out into Drupal fields.
  • Character sets were a challenge. Inevitably, source data that's supposed to be in UTF-8 (or other) isn't consistently so, and it took a great deal of trial-and-error with encoding functions like iconv() to get it right. This is a recurring issue in data migrations.
  • www.economist.com Drupal site makes heavy use of node reference fields. During migrations, you need to relate an article to something that does not exist yet in the database (e.g. an article can have several related articles). Migrate module has built-in support for this. It creates a stub node when the reference does not yet exist. The stub node will get filled in properly later when its information is available.

Break up the project several distinct "migrations"

A migration represents a flow from one set of source data (typically the output of a database query) to a Drupal content type. Destinations can include nodes, taxonomy terms, users, profiles, comments, or private messages. Here are some migrations at economist.com

  • Articles
  • Issues (in the sense of a periodical)
  • Newspapers (our different publications)
  • Customers (users)
  • User roles
  • Blog posts

Write code

The include files in the migrate_example module serve as documentation by example. As of now, you want to use version 2.x which is available for Drupal 6 or Drupal 7. The gist of a migration class is to define a SQL query or other method of fetching the source data and also define mappings between source columns and properties in Drupal objects such as $node, $user, $comment, etc. Here is an example migration:

<?php
/**
* There are four essential components to set up in your constructor:
*  $this->source - An instance of a class derived from MigrateSource, this
*    will feed data to the migration.
*  $this->destination - An instance of a class derived from MigrateDestination,
*    this will receive data that originated from the source and has been mapped
*    by the Migration class, and create Drupal objects.
*  $this->map - An instance of a class derived from MigrateMap, this will keep
*    track of which source items have been imported and what destination objects
*    they map to.
*  Mappings - Use $this->addFieldMapping to tell the Migration class what source
*    fields correspond to what destination fields, and additional information
*    associated with the mappings.
*/
class BeerTermMigration extends BasicExampleMigration {
  public function
__construct() {
   
parent::__construct();
   
$this->description = t('Migrate styles from the source database to taxonomy terms');

   
// Create a map object for tracking the relationships between source rows
    // and their resulting Drupal objects.
   
$this->map = new MigrateSQLMap($this->machineName,
        array(
         
'style' => array('type' => 'varchar',
                          
'length' => 255,
                          
'not null' => TRUE,
                          
'description' => 'Topic ID',
                          )
        ),
       
MigrateDestinationTerm::getKeySchema()
      );

  
// Our fetch query
   
$query = db_select('migrate_example_beer_topic', 'met')
             ->
fields('met', array('style', 'details', 'style_parent', 'region', 'hoppiness'))
            
// This sort assures that parents are saved before children.
            
->orderBy('style_parent', 'ASC');

   
// Create a MigrateSource object, which manages retrieving the input data.
   
$this->source = new MigrateSourceSQL($this, $query);

   
// Set up our destination - terms in the migrate_example_beer_styles vocabulary
   
$this->destination = new MigrateDestinationTerm('Migrate Example Beer Styles');

   
// Assign mappings TO destination fields FROM source fields.
   
$this->addFieldMapping('name', 'style');
   
$this->addFieldMapping('description', 'details');

   
// Documenting your mappings makes it easier for the whole team to see
    // exactly what the status is when developing a migration process.
   
$this->addFieldMapping('parent_name', 'style_parent')
         ->
description(t('The incoming style_parent field is the name of the term parent'));

   
// Open mapping issues can be assigned priorities (the default is
    // MigrateFieldMapping::ISSUE_PRIORITY_OK). If you're using an issue
    // tracking system, and have defined issuePattern (see ExampleMigration
    // above), you can specify a ticket/issue number in the system on the
    // mapping and migrate_ui will link directory to it.
   
$this->addFieldMapping(NULL, 'region')
         ->
description('Will a field be added to the vocabulary for this?')
         ->
issueGroup(t('Client Issues'))
         ->
issuePriority(MigrateFieldMapping::ISSUE_PRIORITY_MEDIUM)
         ->
issueNumber(770064);
  }
}
?>

The Economist used Migrate 1 for this project but we've updated all examples and dicussion in this post for Migrate 2.

Massage the data

Without fail, data needs to be cajoled and massaged on its way into Drupal. A simple example is to transform DateTime columns into the unix timestamp that Drupal expects. Migrate classes provides a method for this sort of transformation:

<?php
public function prepare(stdClass $account, stdClass $row) {
 
// Source dates are in ISO format.
 
$account->created = strtotime($account->created);
}
?>

The end goal here is that you wind up with a completely native Drupal site, as if you had launched on Drupal from the very beginning. An explicit hook for this massage the data encourages that outcome.

Run the migrations over, and over, and over ...

In order to perfect your mappings and transformations, you have to run the migration over and over again. A key benefit of migrate module is that it makes this process fast and effortless. Here is a typical sequence of drush commands where we import and rollback a few times.

drush migrate-import NAME --itemlimit=10
... look at data and web pages. notice and fix problems in code ...
drush migrate-rollback NAME

drush migrate-import NAME --itemlimit=10
... look at data and web pages. notice and fix problems in code ...
drush migrate-rollback NAME

drush migrate-import NAME --itemlimit=10
... looks good, migrate the rest of the data...
drush migrate-import NAME

The rollback commands work so effortlessly because migrate keeps a map between legacy ID and Drupal ID as it imports. With this map, we can delete just the right nodes/users/terms etc. for this migration and no more. Also note that we can cleanly limit the migration to 10 items in this case. This is quite a bit faster than running all 3 million or having to manually cleanup after an aborted migration.

An alternative to rolling back and importing is updating in place: drush migrate-import articles --update. We used this when rolling back would have deleted important data (e.g. rolling back a node would have deleted its comments).

Keep stakeholders focused and informed


Also very useful in migrate module are its admin web pages which inform clients and developers about what's mapped and what is not. Further, open issues about any column/field can be assigned to the client or to the migration engineer. These issues can be linked to client's issue tracking system as well (see graphic).

These web pages ease client anxiety during the days before going live with Drupal. Migrating a live site like economist.com to a new platform is like open heart surgery on your business. Cyrve and the migrate module work hard to make this a routine, reliable and repeatable process.

Quality Assurance

The map tables that enable us to rollback effectively also are a key to auditing the data. Audit processes can be implemented to make automatic comparisons between raw source data and the resulting Drupal objects, because we know precisely which Drupal object resulted from a given source content item.

Performance

Migrating a metric ton of data like www.economist.com, begs for optimization of insertion rate. The best tool for finding slowness is xhprof. Devel and drush and xhprof work great together now, as drush reports the URL of your profiling report at the end of each run. Use that report to identify slow code and remove/refactor it. We had to disable token module in order to achieve excellent performance.

Keep up with changes - incremental migration

A large business like The Economist proceeds cautiously with a platform change. In order to mitigate risk for client and for migration engineers, the migrate module supports incremental migrations in addition to "all at once" migrations. An incremental migration imports only the items which have been added or edited since the last time this migration ran. These items are identified by maintaining a "high-water mark" for each migration that comes from a primary key or datetime column on the source data. Migrate module automatically moves this high-water mark as content gets imported. The Economist has made heavy use of this feature.

Go live

Once incremental migrations are working nicely, The Economist was able to watch her "staging" Drupal site as it keeps up with new content/users etc. Drupal stays in sync, just a five minutes behind the live site. This staging site is a great place for identifying bugs with the site in addition to bugs in the migrated data. The true beauty of this approach comes when we go live with Drupal. All that’s required is to move DNS records to point to the Drupal servers instead of Cold Fusion. There is no big bang migration where everyone holds their breath. The Economist has already come to know and love its upcoming Drupal site and making it live was all party time :).

$ nslookup economist.com
Non-authoritative answer:
Name: economist.com
Address: 64.14.173.20

Notes

Comments

Stuff like this is invaluable. Thanks for sharing your experiences!

Ditto!

I look forward to checking out XHProf as an Xdebug alternative.

I look forward to checking out XHProf as an Xdebug alternative,

Oh yeah, well done Economist! A great publication needs a great CMS :)

I agree totally. This is not only good for The Economist it should even give some great exposure for Drupal.

Blogging about online brands on my blog.

Excellent site.
What kind of slider below?
It does not work on IE 8

-----
my blog

Well I think that at last inclined to Drupal, I'm sure it was an excellent choice, the most recent versions are now more stable and more flexible components, great news!

Regards

-----------------
que es el amor

That was a really insightful write-up ... thanks for that. The comment about "open heart surgery" seems to ring true far too often. This is way beyond the scope of my projects, but the pain of migration seems to be universal. IMHO when we see a "major achievement" using Drupal ... such as this major migration of existing content into a Drupalized environment ... I see a "feature wish list" for a future version of Drupal, so that future adopters of Drupal can do similar with a "minor achievement".

All good news.

Tasmanian web developer.

Glad to work on this stuff!

Great post, Moshe.

Cheers

great work.
congrats!

very nice website, I will start examine this website.
PS How did they create this small add (subscribe to newsletter) at the bottom?

I usually never comment stuff like this, but this site is just perfect. Perfect in every pixel. I haven't seen such accuracy for a very long time. It depicts the level. It looks "expensive".

I do not even mention that it looks perfect in IE6 (which means 20% of all corporate visitors). There is the golden rule: "If a site is displayed correctly in IE6, it will be displayed correctly in any other browser".
Just have a look at Search form... Marvelous.
And all the layout is well-organised.

Perfect.

(Just one thing to mention: I would make the main menu a bit more eye-catching, both by colour and by font.)

There is the golden rule: "If a site is displayed correctly in IE6, it will be displayed correctly in any other browser".

I have very much not found this to be the case. I find it much, much easier to code to Firefox and retrofit to IE6/IE7 (IE8 hardly ever needs retrofitting).

And: Stuff that works in IE6 I find often looks awful in Safari/Chrome.

(I suspect you have more control over the design than I do. I get the design and am tasked to implement it, with as little deviation as possible. In this scenario, I often have to do things that require fundamentally different strategies in IE6 and other browsers.)

This is why I prefer to code my own designs - I don't want another person interfering with either side of that process, since I have twelve years experience on both and can completely control it.

I agree with all the comments. Beautiful site! I would like to know how the developers assembled the story display blocks in the center column (starting just below the top slide show): In other words, the configuration:

Headline

teaser with image

headline 2
headline 3

link to section

It doesn't look like they used either views or panels, based on the classes & ids, unless they modified one of the two to provide custom class generation. Any body know how they did this? Custom block? Would love to be able to do this on some of my sites.

Would love to know how the load speed is accomplished. Caching at the edge?

Once again, masterful job on all fronts.

Believe it or not, there is a content type called 'homepage' and it is themed with node-homepage.tpl.php. It is simple as that. The 'Default frontpage' setting is hardcoded to a path like /node/98347. I guess editorial changes that when they want to push a new homepage? Not too familiar with how that process works.

Speed is mostly attributable to Varnish+Pressflow (architected by David Strauss and 4 kitchens).

This site really shows off what Drupal can do. However, I cried at the fact that they forgot to turn on pretty urls. It should have been one of the first things they did

Clean urls are on but pathauto is off. I guess thats what you mean. Yeah, I cry little too. Economist is working on it. It’s a bit more complicated since they serve some pages off old site still.

Oh yeah, pathauto was the word I was looking for. But also still amazing even though some of the stuff is still hosted from the older site. The most impressive thing about it is, like others have said, is the consistent look all the way back through ie6. Although I hate ie6, that shows true professionalism there.

I mean, look at the node #s, for pete's sake! I first realized Economist was on Drupal when I went there to read something and saw the "/node/16910031"* path. My first thought was "Cool, The Economist is on Drupal."

My next thought was "Holy crap, they've got over 15 million nodes."

--
*... where "16910031" is a random number and not the actual node I went to. I don't remember what that was.

The Economist doesn't have 15M nodes. They moved from a pseudo object oriented schema which stored object ids for all sorts of objects. For instance images, nodes, relationships between nodes, folders for stories, workflow and newspaper import details. I don't know the exact number of nodes but every issue is on there since some time in 1997 and they publish 90 or so print stories a week with probably 10-20 other articles every work day.

That is still lots of nodes.

Pathauto is on, but not for all node types. It is turned on for blogs .g. http://www.economist.com/blogs/newsbook/2010/10/frances_fuel_shortages as the load balancing rule for blogs always has blogs/* upfront and can be correctly load balanced

This is purely because of load balancing. The Oracle and Drupal site run together. For some URLs you get sent to the Oracle cluster and some get sent to Drupal, The method for figuring out which way to send people is based on the URL. i.e. /node* send to Drupal /printedition send to ColdFusion/Oracle

We did think about having the path be something like /article/ but decided that ultimately when the migration is done it wouldn't stay that way so it is node for now until the rest is ported.

Fantastic work, thank you for sharing.

I've been reading up on Migrate module and it's interesting how many concepts it shares with Feeds. Maybe one day the stars align and we both have the right resources at the right time to consolidate our stacks. In the meantime I'll keep peeking over the fence.

Again: kudos for putting the leash on the migration beast and letting us know how you did it.

Yeah, Mike looked long at Feeds when architecting migrate2. We're getting closer for sure. Would be great to merge one day.

What I like about the migrate module is the flexibility it gives you over the data you import.

Inevitably with a large legacy data set like this, there are a lot of application level data 'fixes' which creep in over time. If you were doing a straight forward data transform, these fixes would need to be done in the Drupal application, or some post processing done on the data.

Migrate lets you examine and alter the data before saving it to Drupal, which means that all of your data migration logic is where it should be, and your Drupal instance has clean sensible data to work with.

I think the massaging the data heading above underplays how useful this feature is.

Good point, Jeremy. I added a couple sentences in that Massage the Data section. Still understated, but better.

I was subscribed to the print edition of The Economist for a year in about 1999 and registered for free access to their website for the duration of my subscription. Somehow (perhaps during an earlier botched migration?) my expiration date field was nulled, and I was able to enjoy free access to all of their premium content for about the next 9 years, until those pesky Cyrve folks got involved.

But of course I do now have free access to the Migrate module, so I've probably won overall.

Congrats on completing such a complicated project, and thanks for the write up.

Good Work...

Glad to see Economist.com embraced drupal, We all are proud of you..

Regards
Sagar

Need help ?
Reach me on skype : sag_13684

Share your Posts, Url, Sites
www.sociopost.com

It is write-ups like these that make me ecstatic about moving to drupal about 5 years ago. Excellent work. the site is AMAZING. good job! I've done a newspaper website myself, so I'm a bit aware of the importance of migration and performance. well done!

Still looks like there is a lot of cold fusion references in the code. Is that the case?

Yes, not all of the site is in Drupal. The previous CMS still powers the majority of editing done on the platform. Moving to Drupal should always be done. Moving everything to Drupal regardless of cost shouldn't. Also going live as soon as you can is more important than finishing everything.

Compelling reason for publishers to migrate to Drupal..

Cheers!

Vikram
eqnova.net migrating soon to Drupal

Vikram Bhat
eQNovaTM
Strategy. Consulting. Outsourcing
Web: http://www.eqnova.net
M: +91-9818840518 O: +91-11-32086452
Twitter: @viknomics Skype: bhatvikram518
GTalk: vikram@eqnova.net

high quality british magazine embracing open source. and choosing drupal!
that's great news!!

http://asheshr.com
Freelance Drupal Developer

We had to disable token module in order to achieve excellent performance.

As the project maintainer, I'd be interested in knowing why exactly, if you can provide more details. I know the API itself may be part of the reason since there's no lazy-generation of only the desired tokens in the D6 API, but something had to calling token itself.

Senior Drupal Developer for Lullabot | www.davereid.net | certifiedtorock.com/u/53892

Exactly. Pathauto was causing generation of all tokens on each save. So we disabled both pathauto and token.

That site is great, I hope drupal technology will improve it even greater.

An important core hack to support the stubs was that we maintained node IDs from the old system with a quick hack to node_save to support the adding of the new flag as well as passing in the nid.

This meant that migrate could just create the stubs. I am pretty sure Moshe got this into d7 by default?

moshe thanks for this post. we built our news site on drupal and appreciate all the info you provided.

thanks again!

Yes. nodes and users may now be saved with specified IDs in drupal7 so it will be easy to preserve those during migrations.

Will this migrate .pdf files and images too? Also, what can you do if you are migrating from a large site mainly coded in html? Will migrate except RSS feeds and map the fields in the rss to content type fields?

Also, just to clarify, I am really new to this migration stuff, how much code writing goes into the migration process? Do you have to write code to make it work in the first place? Also, how does this interface with Drush? I read a tutorial on lullabot.com and they didn't mention drush and I didn't see a mention of it on migration's home page, but I would like to know more.

Migrate is a great tool -- many thanks.

I'm curious, what database is the Drupal site running on, and how was data pulled from Oracle during migration?

Economist.com uses MySQL 5.1
Data was pulled from the Oracle database using a Drupal callback that essentially cloned the Oracle tables into a MySQL database row by row (sounds slow but once the big sync was done it was trivial to sync missing rows)

Once the Oracle tables were in MySQL Table Wizard would present them as views for Migrate to work its magic against.

Migratiing to Drupal doesn't have to be hard! This method also allows the parts of the site that aren't yet migrated to Drupal, and there are a lot of parts, to be managed in Oracle and the data synced into Drupal constantly.

Thanks for this interesting report! I'll have a look at xhprof.

Hi, can you please tell some thing about the debete section and how it can be developed

Digo

Hi,
is it possible get information about Drupal 6 structure of The Economist?

I would like to know all contributed modules used, if there is only one Database or multidatabase and also if it use multisite with multi subdomain.

Is it possible get information?

Thank you for your help. I would like to learn from The Economist website structure.

There is only one database that holds Drupal data. I wrote a blog post a long time ago detailing what modules it had at the time.

http://stewsnooze.com/content/economistcom-module-list

Very nice web site,good job!

--------------
Lency Ann

This is so important for the future of Drupal. Thank you two for the write-up, the public release of two out-of-this-world modules, and for everything you do in the community. My hat is off to you two.

Its amazing
Nowadays Drupal developers having good demand in s/w market

If only I had read this write-up before I moved my car site over to drupal, the migrate module would have made things a lot easier and less hassle.

Hi,
I am new to drupal and migrate module and want to use migrate module to migrate a website from asp.net to drupal. For this i have analysed the code of the beer and wine examples provided with migrate module and also made changes in the code so as to see any change in the drupal UI. But unfortunately, even after changing the name of the content types in both Beer and Wine example codes, there is no change seen on the drupa UI. May be the files named beer.inc and beer.install.inc are not the ones affecting the content types on druapl UI(correct me if I am wrong). Please help me with the control flow of the migrate module so that one can know the code to be modified to actually see the changes on the drupal UI.

Thanks a lot in advance.

Rishi

Rishi

very nice website, although require advanced drupal user to produce such website.

All the time I'm learning Drupal on the occasion of the creation of a small regional service. And The Economist is the pinnacle of my wildest dreams:)

This is a very good example for an excellent drupal site - good work guys.