How to produce a static mirror of a Drupal website?
Note: You should certainly only use this on your own sites...
Prepare the Drupal website
Create a custom block and/or post a node to the front page that notes that the site has been archived from Drupal to static HTML. Be sure to include the date of the archiving. Consider including a link to the future versions of the site (e.g. if you are archiving a 2008 event, link to the URL of the next event).
Disable interactive elements which will be nonfunctional in the static HTML version.
Use the Disable All Forms module to disable all forms.
- login block
- who's online block
- anonymous commenting
- links to the search module and/or any search boxes in the header
- comment controls which allow the user to select comment display format
- Disable ajax requests such as views pagers.
- Remove Views exposed filters
- Update all nodes by setting their comments to read only. This will eliminate the login or register to post comments link that would otherwise accompany each of your posts. You can do this through phpMyAdmin by running the following SQL command from the node table:
update node set comment = '1';
- It can also be a good idea to disable any third party dynamically generated blocks; once the site is archived, it would be difficult to remove these blocks if the 3rd party services are no longer available.
Create a static clone
Wget (UNIX, Linux, OSX, ...)
Wget is generally available on almost any 'nix machine and can produce the mirror from the command line. However, wget seems to have problems converting the relative style sheet URLs properly with many Drupal site pages. Modify your theme template to produce hardcoded absolute links to the stylesheets and try the following command:
wget -q --mirror -p --html-extension -e robots=off --base=./ -k -P ./ http://example.com
wget respects the robots.txt files, so might not download some of the files in /sites/ or elsewhere. To disable this, include the option
-e robots=off in your command line.
HTTrack (UNIX and Windows and Mac/homebrew)
HTTrack. The Windows GUI client version will produce the mirror with almost no configuration on your part. One potential command to use is:
httrack http://2011.example.com -K -w -O . -%v --robots=0 -c1 -%e0
Note the -K option creates absolute links - this is only sometimes useful if you are hosting a public mirror on the same domain. Otherwise omit -K to produce relative links
The -c1 options makes only 1 request at a time so this becomes rather slow. The default is -c10, so you might considering something more like this value when archiving your own site.
If you're working from a local installation of Drupal and want to grab ALL of your files in a way that you can just copy them up to a server, try the following command:
httrack http://localhost/ -W -O "~/static_cache" -%v --robots=0
Site Sucker. This is a Mac GUI option for downloading a site.
HTML Export module
Check out the HTML Export project - This is a Drupal module that dumps a working HTML version of your site.
Verify that the offline version of your site works
Verify that the offline version of your site works in your browser. Test to make sure that you properly turned off any interactive elements in Drupal that will now confuse site users.
Why create a static site archive?
- Perhaps over time your website have essentially become static. Because these sites still require security administration, an administrator has to continue to upgrade the site with patches or consider removing the site all together.
- You want to ensure that the site is preserved on Drupal.org infrastructure (without direct cost to you)
- Alternatively, you may want to produce an offline copy for archiving or convenient reference when you don't have access to the Internet. Before simply removing the site, consider another alternative: a Drupal site is maintained inside a firewall, and then the output of the site is periodically cached to static HTML files and copied to public servers.