Import HTML

dman - January 23, 2006 - 12:33
Import HTML logo

Import an old existing, static HTML site structure into the Drupal CMS as structured nodes!

Allows an admin to define a source directory of an existing traditional static HTML website, and import (as much as possible) the content and structure into a drupal site.
Source files will be stripped of exisiting chrome and navigation elements before being inserted as nodes.

See import_html_help.htm for a largish overview of import_html features

  • Maintain old URLs
  • Re-create menu structure
  • Validate & improve markup automatically
  • Import Metadata - old dates, keywords, descriptions
  • Additional custom fields - Import old semantics to multiple CCK fields!
  • Operate over thousands of documents.

Read a case study or a walkthrough

This module has no public face at all - it's purely admin. In fact, once it's done you should turn it off again.

Important: Requirements

Because of the number of settings, this is not just a point-and-go module. You also need:

  • XML/XSLT support on the server. Check your php_info(), if it says either XSL or XSLT anywhere, it's fine.
    PHP4 support is being dropped in the Drupal 6 version.
  • HTMLTidy - Either with the PHP module or the commandline version.
    Update: there is now an automatic installer for HTMLTidy bundled in for Linux hosts. There are at least three flavours of tidy extensions for PHP, not including the commandline alternative. The PHP5 binary distributed version has been targeted, the PECL one can be made to work with some tweaks.

See the help document for details. Reading the walkthrough will illustrate what's possible with this.

Recent changes include better control of subdirectories for giant sites, now you can manage the import of thousands of documents without timing out, Just do a subsection at a time.
... even MORE recent changes do large imports as a batch process!

Releases

Official releasesDateSizeLinksStatus
6.x-1.02009-Oct-06169.99 KBRecommended for 6.xThis is currently the recommended release for 6.x.
5.x-1.22007-May-0193.31 KBRecommended for 5.xThis is currently the recommended release for 5.x.
Development snapshotsDateSizeLinksStatus
6.x-1.x-dev2009-Oct-07169.99 KBDevelopment snapshotDevelopment snapshots are automatically regenerated and their contents can frequently change, so they are not recommended for production use.
5.x-1.x-dev2008-Feb-0198.83 KBDevelopment snapshotDevelopment snapshots are automatically regenerated and their contents can frequently change, so they are not recommended for production use.
4.7.x-1.x-dev2006-Nov-1368.35 KBDevelopment snapshotDevelopment snapshots are automatically regenerated and their contents can frequently change, so they are not recommended for production use.


 
 

Drupal is a registered trademark of Dries Buytaert.