Import HTML

dman - January 23, 2006 - 12:33
Import HTML logo

Import an old existing, static HTML site structure into the Drupal CMS as structured nodes!

Allows an admin to define a source directory of an existing traditional static HTML website, and import (as much as possible) the content and structure into a drupal site.
Source files will be stripped of exisiting chrome and navigation elements before being inserted as nodes.

See import_html_help.htm for a largish overview of import_html features

  • Maintain old URLs
  • Re-create menu structure
  • Validate & improve markup automatically
  • Import Metadata - old dates, keywords, descriptions
  • Additional custom fields - Import old semantics to multiple CCK fields!
  • Operate over thousands of documents.

Read a case study or a walkthrough

D6 port almost here, but needs encouragement
Update May 2009: ALMOST ready to call it a number release.

Drupal 6 version in the pipeline - if wanted enough

If this is useful enough to you, please consider hurrying things along with an encouragement via ChipIn. It's easily $1000 worth of work to get it to a good state. But if enough people together think it's worth $500, then progress will actually happen!
If this can save 10 people $50 worth of time (it really really will) then we can all break even. In reality, this thing will save anyone 3 days - 2 weeks of copy-paste when used properly. What's that time worth to you?

(if you object to a hard-working developer suggesting his time is worth more than $0.00 per hour, please just ignore this message)
Follow progress in the D6 port thread here.

This module has no public face at all - it's purely admin. In fact, once it's done you should turn it off again.

Important: Requirements

Because of the number of settings, this is not just a point-and-go module. You also need:

  • XML/XSLT support on the server. Check your php_info(), if it says either XSL or XSLT anywhere, it's fine.
    PHP4 support is being dropped in the Drupal 6 version.
  • HTMLTidy - Either with the PHP module or the commandline version.
    Update: there is now an automatic installer for HTMLTidy bundled in for Linux hosts. There are at least three flavours of tidy extensions for PHP, not including the commandline alternative. The PHP5 binary distributed version has been targeted, the PECL one can be made to work with some tweaks.

See the help document for details. Reading the walkthrough will illustrate what's possible with this.

Recent changes include better control of subdirectories for giant sites, now you can manage the import of thousands of documents without timing out, Just do a subsection at a time.
... even MORE recent changes do large imports as a batch process!

Releases

Official releasesDateSizeLinksStatus
5.x-1.22007-May-0193.31 KBRecommended for 5.xThis is currently the recommended release for 5.x.
Development snapshotsDateSizeLinksStatus
6.x-1.x-dev2009-Jul-01170.93 KBDevelopment snapshotDevelopment snapshots are automatically regenerated and their contents can frequently change, so they are not recommended for production use.
5.x-1.x-dev2008-Feb-0198.83 KBDevelopment snapshotDevelopment snapshots are automatically regenerated and their contents can frequently change, so they are not recommended for production use.
4.7.x-1.x-dev2006-Nov-1368.35 KBDevelopment snapshotDevelopment snapshots are automatically regenerated and their contents can frequently change, so they are not recommended for production use.


 
 

Drupal is a registered trademark of Dries Buytaert.