As one of the world's premier cancer centers, Memorial Sloan-Kettering Cancer Center is committed to exceptional patient care, leading-edge research, and superb educational programs.

Memorial Sloan-Kettering Cancer Center's broad mission requires that the www.mskcc.org design and information architecture serve a wide variety of audiences, each of whom are interested in very different content. For example, each aspect of MSKCC's mission, treatmentresearch, and education, has a dedicated landing page. A newly diagnosed cancer patient generally is interested in seeing only information about their specific cancer type while a postdoctoral student might want to view a researcher within a given research program or department. The primary goal for the redesign was to better address the needs of the wide variety of MSKCC's users, while cleanly conveying their mission.

Take a video tour of Memorial Sloan-Kettering Cancer Center’s redesigned website.

MSKCC Homepage Screenshot
Why Drupal was chosen: 

In 2001, after several prototypes and iterations, the Memorial Sloan-Kettering Cancer Center website (www.mskcc.org) launched on my "homegrown" content management system (CMS) called the "Inettool." For the last decade, while maintaining and customizing the Inettool, I came to the realization that I was digging a "CMS hole" where my code and MSKCC's data were gradually being buried and trapped in this custom-built system. This experience led me to conclude that "custom-built software requires everything to be custom built." Using an open source CMS, like Drupal, prevents one from being trapped in a custom-built CMS, because open source code provides pre-existing and tested functionality that can be customized.

The simplest explanation to why I choose Drupal is "given enough eyeballs, all bugs are shallow," Drupal has a community of engaged participants looking at and contributing code. For me, Drupal's contributed code and open discussions are its biggest strengths; I have not hit any brick walls or black boxes while using Drupal to build the MSKCC.org website.

So, after 2 years of work, Memorial Sloan-Kettering Cancer Center's redesigned website, MSKCC.org finally launched. Rebuilding and moving MSKCC to Drupal was my biggest project ever and my first large scale Drupal project, which is why I feel it is important to share my experience...

Describe the project (goals, requirements and outcome): 

About the "Switch"

As mentioned in the introduction, the previous website was built using a custom built CMS, called the "Inettool." The driving force behind the custom built CMS was the desire for the institution to have a specialized, customizable website.

I will let you in on a little secret and misconception: "most websites are never really that special, it is the institution and/or business behind the website that is special." In the case of the MSKCC.org, the CMS is just a tool used to convey their mission and message, which I personally summarize as "quality, compassionate care."

I successfully convinced MSKCC to switch to Drupal by explaining the ongoing challenges of maintaining their custom built CMS. One final, but key selling point for switching to Drupal was the availability of enterprise support from Acquia.

So MSKCC agreed to adopt Drupal and "the switch" got underway.

The "switch" can be broken down into several steps/decisions, which include:

  • Migration
  • System Architecture
  • Content Management
  • Information Architecture
  • Site Features (aka modules)
  • Templates (aka themes and panels)

First, Some Stats

Below are some general statistics to help describe the scope of the migration and the general system architecture requirements.

Site Stats (for 2011)
Visits Page Views
5,567,343 Visits

3,364,927 Unique Visitors
20,805,773 Pageviews
Drupal Stats
Modules Users
206 modules enabled

142 contrib modules

64 custom modules
55 active users

17 roles
Nodes Books
33 content types

297 fields

11297 nodes
139 books

2003 book pages
Menus Taxonomy
108 primary links

26 secondary links
14 vocabularies

1972 terms
Views  
112 views  

Migration

Conceptualizing a migration of data from the Inettool to Drupal was the first step of the switch process. There were many questions to be asked and answered on how the migration would be accomplished. For instance: What data would be moved? How much data could be cleanly migrated? How much data would require additional post-migration clean-up?

How to migrate an existing website to Drupal can be a pretty easy question to answer, since Drupal has several contributed modules to import and export data. Honestly, I made a 'newbie' mistake, which may have been a good decision, to write a custom migration module from scratch. I saw this as an opportunity to learn PHP and the inner workings of Drupal's API and database structure while knowing that this code would be thrown away after the final migration. Anyone new to Drupal should be willing to throw away code, it is just part of the learning process.

Besides learning Drupal, I had three goals for my migration script, which were:

  • Automated nightly builds so that everyone could review the migrated data as changes were being made.
  • Single page imports that would be used to debug minor migration issues.
  • To cleanly migrate 90% of the existing 10,000+ pages, thus requiring little post migration clean-up.

Besides one or two issues that had to be fixed post-final migration, the data migration was successful, requiring about 3 weeks of post-migration clean-up but admittedly there was a lot of pre-migration clean-up. The most important thing was when it was time to finally migrate the website, everyone on the project was comfortable and ready to move to Drupal.

System Architecture

MSK Custom
Requirements
Custom MSK requirements

Since the project began in 2009, the new website uses Drupal 6. Though the website has no patient health information (PHI), MSKCC reasonably required that the web servers be hosted internally. The key performance recommendation I made, especially for MSKCC's initial launch on Drupal, was to have no authenticated traffic on the website. By keeping all external users anonymous, every page on the website can be cached by a reverse proxy and the website can handle a fairly large load.

No one at MSKCC, including myself, had ever launched a large Drupal or LAMP stack website, so Acquia was brought in to do a general Drupal site audit and make server recommendations. The final solution was an F5 load balancer in front of 2 varnish reverse proxy/web servers, 1 memcache server, and 2 master and slave MySQL DB servers.

In the end, the server architecture for this website is pretty much the standard set-up for a high-performance Drupal website. The website is very responsive and has come nowhere near reaching its max load.

Custom server requirements were added to the 'Site status' report using hook_requirements(). These custom requirements check for properly configured firewall rules, internal webservice access, and additional PHP add-ons, like Oracle's OCI8 Database drivers.

Content Management

The MSKCC website is primarily a content and information-driven site which is why it was important to focus on the website's content types and navigation system before implementing site features (aka modules). The website has 33 content types, which may seem like a lot but the broad mission of MSKCC, which is treatment, research, and education, requires some additional content type specificity. For example, doctors, researchers, and staff members all require unique content types with custom fields with unique node access rules and controls.

Doctor Researcher Member
Doctor Researcher Member

Below are some notable content types:

HTML fragment
An HTML fragment is small piece of HTML code that is used as global content within the website's blocks, main menus, and/or super footers. HTML fragments are primarily used by web developers to build editable pieces of specialized but customizable content.

View
The view content type provides content administrators an easy mechanism for building listings of data (aka Views) on the website. The view content type includes several CCK fields that are passed as arguments to a selected view.

Teaser
The teaser content type is a simple call-out, which consists of title, image, description, and a link that redirects to a complete web page. The teaser content type is used to create a specialized call-out for a page whose default teaser is not appropriate.

Information Architecture

Menus
Out of the box, Drupal supports a primary and secondary menu. These menus are used in the main navigation bars at the top of website. The primary and secondary menus handle the first 3 tiers of MSKCC.org, and then a combination of taxonomy, books, and Views manage the lower levels of the website's information architecture.

Taxonomy
Drupal's taxonomy system is used to manage MSK hierarchical medical specialities and even simple event categorization. I built a custom taxonomy helper module to generate hierarchical and alphabetical taxonomy term displays for finding a doctor by specialities or department.

Books
Besides having a lot of unique content, the website has many unique sections maintained by different users. The Book module, included in Drupal core, was the best means to break down the website's very rich information architecture. A custom 'Book helper' module was created to allow administrators to customize a book's navigation using some additionally available menu features, including disabling menu items and customizing a menu item's title.

Book Helper
Order Page
Taxonomy Helper
Add Nodes Page
Book Helper Order Page Taxonomy Helper Add Nodes

Views

I use Views religiously, for anything that is "a list of things." As long as the Views module remains as helpful with either generating an SQL query and/or with displaying the results of an SQL query, I am going to use it. A custom MSK views module was created to handle all Views-related customization including altering queries, exposed filters, and additional template preprocessing.

Patient Stories Videos Search
Patient Stories Videos Search

Some Lessons Learned...

Follow Drupal's best practices

One of the key factors behind Drupal's healthy community of code contributors is the project's well-defined and enforced best practices. Before switching to Drupal, the only best practice I followed was trying to write clean code. Following Drupal best practices was the easiest way to improve my programming skills and the overall quality of the website's code.

Below are the five Drupal best practices that I adhered to during development of the MSKCC website:

  1. Code standards
    Drupal's code standards are very well documented. The Coder module is extremely helpful in correcting any bad habits and mistakes.
  2. Version control
    Use version control. `nuff said
  3. API documentation
    Generally, developers hate writing documentation! To encourage myself and all future developers on the website to write decent API documentation, we set up a secure api.mskcc.org website using the API module. Seeing one's lack of documentation or just grammatical mistakes on a website can be a great motivator to make improvements or correct errors.
  4. Issue tracking
    Getting the project team, including myself, to switch from tracking issues by email to using a purpose-built tracking system took considerable effort but everyone is now happily using Unfuddle to manage issues.
  5. Unit testing
    SimpleTest is now part of Drupal 7 and this is the only best practice that I admittedly fell short of implementing. Unit testing is something I hope to implement during the upgrade to D8.

Namespace everything.

Originally, I started out namespacing just my modules with msk_* and soon realized it helps to namespace every custom object including Views, Panels, Rules, and even CSS classes. I namespaced all my views with 'msk_', then included the type of view, and finally, provided a unique name for the view. For example, the clinical trials view is named 'msk_directory_trials' and the view used for the news feed content pane is named 'msk_content_pane_news_feed'.

Export everything.

The project does not use the Features module but does export everything into code, including Views, Panels, Rules, and ImageCache. The website uses the Strongarm module to export almost all of the website's configuration settings (aka variables) into code. I created a Strongarm dump module which allows every system configuration page to be easily exported. When the site is updated to D8, it will use Features module.

Document everything.

I personally use Google Docs to document and share everything. I also keep an organized list of any useful modules and/or Drupal-related blog posts. There is no 100% perfect resource for Drupal, so it is worth tracking discussions about tricks, hacks, and APIs for modules like Views and Panels.

For developer documentation, I made sure to include the recommended README.txt files and API comments with every module and set up a series of README files for coding standards, installation guides, changes and issues with modules, etc., which are stored in SVN and available via a secure help section within the MSKCC website.

Name Spaced Exported
Views
Exported/Strongarmed
Variables
MSK README.txt
Files
Name Spaced Exported Views Strongarm Dump Module MSK README.txt Files

Conclusions

Drupal works... maybe this is too simple a statement for a complex project comprised of close to 200 modules, but in the end Drupal accomplished what it was designed to do: build a website. Drupal allowed MSKCC to focus on their website's mission and not the technology behind it. In the end, MSKCC's goals were met because the website looks great and the information is easy to find.

Technical specifications

Why these modules/theme/distribution were chosen: 

Contributed Modules

The website uses 100+ contributed modules, which were selected based on their usage stats balanced against their usefulness on the MSKCC website.

In fact, some key challenges were solved by using some of the less-popular contrib modules and features. Some examples are:

Third Party Wrappers
MSKCC has several applications that are built using ASP.NET. To maintain a consistent look and feel, the main website's template must be shared with these applications. The Third Party Wrappers module solved this challenge by allowing the website's template to be wrapped around a "third party application" by creating header and footer snippets that developers can include in their application.

Node order
By default, Drupal's taxonomy system orders a term's nodes by its posted date. Doctors needed to be weighted by job title when listed within their department and specialities. The Node Order module provided MSKCC with this functionality and was very easy to set up.

Print PDF
The Print module is a very popular Drupal module that allows users to print pages and even books as PDF documents. MSKCC is using this feature to generate a PDF of an entire cancer overview, like Lung Cancer, for patients and caregivers. This feature allows those who would rather read offline to print out all the information about a cancer instead of reading it on a computer screen, or having to print each page individually.

Custom Modules

There are about 50 small, custom modules that were created for the MSKCC.org website. These custom modules contain mostly of glue code, small enhancements to existing module, and tweaks to improve user experience. The decision to use small custom modules was inspired by the Unix philosophy: "Write programs that do one thing and do it well."

A few noteworthy custom modules and code snippets are:

The MSK toolbar module is good example of 'glue code' used to pull together the Print module, Service links, and a custom Subscribe to Feeds module.

The MSK glossary module allows users to look up cancer related terms within the NCI glossary. The pop-ups are generated using the Beauty Tips module.

The MSK disaster recovery admin settings form is used to track and disable certain aspects of the website in the event of the MSKCC.org website having to be moved to the MSKCC disaster recovery data center.

MSK Toolbar MSK NCI Glossary MSK Disaster Recovery
MSK Toolbar NCI Glossary Integration MSK Disaster Recovery Page

See the complete list of custom modules used on the website.

Templates

MSK Comps Report
MSK Comps Report

The website uses the Zen base theme and follows the style guidelines set forth by this clean and well-done theme. I also applied the concept of the Drupal 7 Stark theme by copying and streamlining a lot of CSS files from core and contrib. One easy optimization was to remove any admin-related classes from the main theme since they would never be used because the website uses a separate admin theme. The site admin them is a backport of the Drupal 7 Seven admin theme.

Similar, to the solution of using hook_requirements() to add additional MSKCC-specific requirements to the 'Site status' report, I created a custom MSK comps reports to track the development progress of the website's design templates along with links to working examples of each template.

Panels

Panels are used to lay out the content on MSKCC's landing pages. Using Panel nodes allows for a node's CCK fields to be easily displayed in panel panes.

Panels, like everything in Drupal, is extensible so I created some custom panel layouts, styles, and content types to help build MSKCC's 20+ landing pages.

Layouts
MSKCC's custom panel layouts are simply copies of the default layouts include with the Panels module with a select menu added to set the default panel pane column widths to line up with a 960 grid.

Styles
A custom MSK stylizer was built to allow editors to select pre-defined custom classes for each panel pane. The Skinr module provides very similar functionality for an entire theme but the website only required this functionality for panel panes so it was easier to just implement a custom stylizer (which was copied from the default stylizer included with CTools).

Content Types

Finally, custom (Panel) content types where required to build the landing page slideshows and video players but whenever feasible I used Views content panes to build custom content types.

MSK (960) Layout MSK Stylizer MSK Content Type
Example of MSK Panels Layout Example of MSK Panels Stylizer Example of MSK Panels Content Type
Community contributions: 

Sandbox Modules

While planning and implementing MSKCC's custom modules, I tried to make sure that any re-usable functionality was abstracted out into generic modules that could be shared with the Drupal community. Meanwhile, the great GIT migration occurred which changed and improved how the Drupal project and its contributed modules were being developed. One of the coolest changes was the addition of developer sandboxes. Sandboxes are basically open sourced Drupal projects that are not fully-fledged projects but they give developers a way to share their code. This is exactly what I intended to do.

During Randy Fay's DrupalCon presentation "Git on Drupal.org: It's Easier Than You Think!", I asked the question "should developers just sandbox all their code while working on a Drupal website?" The answer I got was "yes," so I decided to build and share my sandbox. I restructured my 'sites/all/modules' directory to reflect this by adding a 'sandbox' directory next to my 'contrib', 'custom', and 'dev' directories. I would describe this new 'sandbox' directory as code that sits somewhere between being completely custom to that which may one day be contributed back to the Drupal community.

Please note: Some of the 'sandbox' module below have not (and may never be) uploaded to Drupal.org because I feel I won't be able to fully support the code or there are similar modules already available on Drupal.org.

Access Control

  • Book author access: Allows a book's main page author to edit and manage all lower level book pages.
  • User access control: Allows a user to grant other users access to update their content.

Development

  • API browser: Makes it easier to navigate API documentation and source code.
  • Content labels: Allows administrators to update the titles and descriptions for a content type and its fields on one page.
  • Content analyze: Adds an analyze (field lengths) tab to the content types - fields admin section.
  • Strongarm dump: Allows module variables to be exported into arrays and objects that can be used by the Strongarm module.
  • System summary: Builds a report to list site statistics and installed modules and themes.

Input Filters

  • Image filter: Display an image's title attributes as a caption below or next to an image.
  • jQuery UI filter: Converts static HTML to a jQuery UI Accordian or Tabs widget.
  • Menu filter: Inserts a menu's links as a list or dropdown within the body of selected text.
  • Short-hand path filter: A filter that allows for short-hand redirect paths to be entered and replaced within any text.
  • TOC filter: Converts header tags into a linked table of contents.

Node

  • Book helper: Improves Drupal's core Book module's functionality.
  • Node parent title: Automatically prepends or appends a parent title to a node's title when it is saved.
  • Node reference back reference: Automatically creates node reference back references for selected content types and fields.
  • Weight reset: Adds a reset button to the weight-based sorting view from the Weight module.

Taxonomy

  • Taxonomy helper: Helps improve the presentation of vocabularies and term hierarchies using custom templates and Views.
  • Taxonomy permissions: Adds 'view vocabulary terms permissions' for taxonomy-related pages.

User Experience

  • Add to calendar: Provides 'add to calendar' links for Outlook, Google Calendar, Yahoo! Calendar, and iCal.
  • Inline links: Adds custom and automated inline links to content.
  • Subscribe to feed: Allows users to subscribe to an RSS feed using an RSS/Podcast reader.

Utility

  • Create content: Adds contextual information to 'Create content' menu links.
  • Ctools jump menu style: Converts CTools jump menu into a stylized HTML menu.
  • Global optimizer: Groups and optimizes CSS and JS into global and page-specific files.
  • Flush page cache: Easing the pain when you need to flush Drupal's cache.
  • Menu redirect: Adds the ability to set a menu item to be a redirect which prevents multiple menu items from being in the active menu trail at the same time.

Views

  • Views global settings: Allows Views admins to define global settings (ie caching) that are shared by all Views.
  • Views URL alias node: Allows node-related Views to be filtered by path aliases.

Webform

  • Webform disable results: Allows editors to disable the saving of Webform submissions.
  • Webform results access control: Allows selected users and roles access to view and edit Webform results.

Custom Modules

  • GSK block: Defines custom blocks for GSK website.
  • MSK main: Provides shared utility functions for MSK modules.
  • MSK block: Defines custom blocks for MSK websites with some block helper functions.
  • MSK deployment: Deploys the MSK SVN codebase to multiple web servers.
  • MSK input filters: Input filters for MSK-specific content and customizations.
  • MSK form tweaks: Alters system, contrib, and node forms.
  • MSK glossary: Allows users to look up cancer-related terms within the NCI glossary.
  • MSK group: Manages custom MSK groups (aka labs, core facilities, and research programs).
  • MSK group access: Allows the author of a group to edit and manage all group pages.
  • MSK herbs: Custom JSON webservice for MSK herbs.
  • MSK media: Enhanced multimedia content generated from the media_brightcove.module.
  • MSK menu: Adds additional functionality to Drupal's menu system.
  • MSK menu block: Stores all MSK menu blocks in code with additional custom logic.
  • MSK menu breadcrumb: Handle custom breadcrumbing for MSK books, groups, and orphaned nodes.
  • MSK migrate: Stores and displays information about Inettool data migrated to Drupal including originating id, meta data, and redirects.
  • MSK migrate inetdata: Migrates MSKCC inetdata table data to Drupal's webforms.
  • MSK migrate Inettool: Migrates an inettool project's site architecture, content, and resources from the Inettool database to Drupal.
  • MSK migrate PRG: Migrates doctor bios and related directories from the Physician Referral Guide (PRG) database to Drupal.
  • MSK migrate protocols: Migrates MSKCC clinical trials (protocols) to Drupal.
  • MSK node: Enhances and organizes Drupal's core and contribute node related modules.
  • MSK panels: Stores all MSK panels in code.
  • MSK path: Manages MSK's SEO friendly paths.
  • MSK RSS: Handles RSS and Podcast formatting for nodes and Views.
  • MSK search: Handles customization of MSK (Google Mini) search results.
  • MSK secure pages: Sets which pages are always going to be used in secure mode (SSL).
  • MSK stats: Manages stat tracking tags/codes for MSKCC's DoubleClick and Did-It accounts.
  • MSK theme: Contains re-usable theme and meta data related functions.
  • MSK toolbar: Toolbar block for MSK, includes glossary, print, download, email, and share.
  • MSK trials (aka protocols): Synchronizes trial content type with the MSK PIMS protocol database.
  • MSK user: Adds additional information and functionality to Drupal's user profiles.
  • MSK views: Stores all MSK views in code and enhances Views with exposed filters.
  • MSK webform: Tweaks and adds additional functionality to the Webform module.
  • MSK webform payment: Payment handler for MSK Webform module.
  • MSK workflow: Custom workflow module that integrate revisioning and workflow.
  • MSK wysiwyg: Enhances CKEditor WYSIWYG.
Organizations involved: 
Team members: 
Project team: 

The launch of the new MSKCC.org was a joint effort of four different groups/organizations who were responsible for design, content, web development, and infrastructure. Magnani, Caruso and Dutton (MCD) designed the new site and re-worked the information architecture. The Big Blue House (My company) was responsible for all Drupal development. MSKCC's Department of Information Systems configured and administers the enterprise LAMP server stack. Finally, MSKCC's Department of Public Affairs manages the website day-to-day, and is responsible for the high-quality content and beautiful photography, as well as ongoing strategy and optimization.

MSK Custom Requirements
Doctor
Researcher
Member
Book Helper Order Page
Taxonomy Helper Add Nodes
Patient Stories
Videos Search
MSK Toolbar
NCI Glossary Integration
MSK disaster recovery
Comps
Example of MSK Panels Layout
Example of MSK Panels Stylizer
Example of MSK Panels Content Type
Name Spaced Exported Views
Strongarm Dump Module
MSK README.txt Files
Sectors: 
Healthcare

Comments

arihant007’s picture

Good job guys! I'm working on a similar project & your case study gives me more confidence to move about..! Thanks a ton & wishing you all the very best. :)

thanyawzinmin’s picture

I want to have skills like you dude.
I am still learning.
But I m sure I will come up to Drupal world.

g8 site.

bsuttis’s picture

Very well written case study, great job too. A beautiful example of how awesome Drupal is.

chintan4u’s picture

Really Nice work and very well explained
Keep it up Dude.

Have you faced any performance related issues...here...?

-
Chintan Umarani
Drupal Developer
www.umarani.com

shamio’s picture

Why performance problem? As mentioned, this website uses reverse proxy to cache every page of the website.

jrockowitz’s picture

Every page is cached by varnish, only when someone submits a form is the posted data hitting Drupal. This decision was made to avoid any performance issues and generally all authenticated traffic for MSKCC is going to their patient portal, http://mymskcc.org.

When the website is moved to Drupal 8, we are going to allow authentication and rework our caching strategy to use 'Edge Side Includes' (http://en.wikipedia.org/wiki/Edge_Side_Includes).

Finally, for administrators and editors who are authenticated, we have APC and memcache installed. Also, all editing is done on dedicated internal server which is optimized for editing pages which generally uses more memory per page request.

ckvergleich’s picture

Wow, very good job. I like the design and the usability.

Drupal is awesome!

davidhunter’s picture

Thanks a lot for such a detailed description of your process and module implementation.
This is very informative and useful.
How long did this project take and how many people were involved ?

jrockowitz’s picture

The entire project was completed over a 2 year period with the Drupal work taking about 9 months. The migration, integration of new features and content, and theming took about 3 months each. The actual Drupal development and theming was done by just me, with one system administrator helping with the LAMP stack, and a consultant helping with Drupal training. Acquia performed 2 site audits during the project's development. Finally, there were 2 web production people helping with QA and content cleanup.

jacobarnold472’s picture

Hi, really good job Drupal is awesome I have a site pelleve which is designed on HTML I want to know my page rank is good but if I convert my website on drupal than the page rank remain same or I will face any issue regarding page rank.

zgos’s picture

The first time to see so much views configure files. I think you may do a lot of views. How about the views performance about your site?

jrockowitz’s picture

Yep, the site has a lot of views due to the number of content-types and related sub-displays. Also, there is custom admin only view for each content-type. The project started out using a very early version of Views 6-x.2.x which did not allow display ids to be changed or sorted in the UI and the process of overriding (and I think cloning) a display was clunky. So I went with the rule that if I needed to override more than 2 aspects (ie filters, sorting, fields, display settings etc...) of the default view I would create a new view. With this said, the ability to organize one's displays in Views 3.x is so much better, when I upgrade the site to D8, I am going to have to redo all the views and will consolidate them, ideally having one view for each content-type.

Honestly, I have not noticed any views specific performance issues but I will admit when the views module clears the menu cache (in core) it take a very long time for the site's menus to be rebuilt.