I like this module very much and I am wondering if you are developing it for Drupal 7.

Comments

jtherkel’s picture

+1 The Import HTML module helped me quickly create a functional demo of how we might move to Drupal 6. The demo convinced everyone that Drupal is viable, but we want to plan for a D7 launch, and I do not see many similar options for D7.

Migrate module has a development version for D7, and it comes highly recommended, but it requires getting your content into a DB. For various reasons, it's easier to rip some sites from HTML than their legacy CMS DB. It seems I could cobble something together, maybe using QueryPath and hand-writing PHP to insert into a DB.

Does someone have plans to upgrade Import HTML to D7? I can contribute documentation and testing, but I'm not a hardcore PHP developer.

dman’s picture

Issue tags: +drupal7

It certainly will be going forward to drupal 7, and an amount of internal restructuring has happened to prepare it for that in the current dev version.
However any actual development towards a release will happen when I have a project to do that requires it, and will be committing to d7. For now, the big jobs I have lined up are not looking for cutting edge development on an unreleased platform, so that's not being paid for. On the other hand, I've got work that does involve extending the d6 branch. Thus I'm not keen to branch until later, as I can't be adding features to a d7 branch that I'm not developing with or testing.
If a d7 sponsored job comes up, then work on that can be looked at, but it won't be until the new year.

In other news, I'm looking for cross over with some of the other modules in this area, and moving towards some commonality with feeds, exportables, deploy and more. This is still an active project, just d7 will happen when it gets paid for

jtherkel’s picture

I'm attempting to hack together a workable D7 version of HTML Import in my dev environment. I'm using the output of the Coder Upgrade module, and I cannot get past this bug. I have googled this to death, but I'm a novice PHP developer. Here's the error and relevant code.

Fatal error: Cannot use string offset as an array in /path_to_my_d7_root/sites/all/modules/import_html/import_html_ui.inc on line 1017

// Initialize multistep
// [...]  
if (empty($form_state['storage']['step'])) {
  // we are coming in without a step, so default to step 1
  $form_state['storage']['step'] = 1;
}

Any suggestions or helpful links?

TIA,
John

dman’s picture

There will be lot of changes needed to get it d7 compatible. the multistep form handling (among many others) will need to be revamped a lot.
Very hard. :-(

byrond’s picture

I had been trying the same thing as #3 and ran into the same problem. It looked to me like that function was being called twice when rendering the form. The first time through, the proper parameters get passed. The second time through, they are different which throws the error. I assumed I would have to learn about how to do forms in D7 to go forward. We will be looking into other possibilities for importing content, but so far, we can't find any general consensus when it comes to this process, or even the task of moving nodes from one system to another. If I get back to working on converting this module, I'll be sure to share.

glottus’s picture

subscribing. This module has helped me create several sites in the past and would be invaluable to have in D& - the ability to add items to the menu, preserve URLs and rewrite links in the imported code are all features I cannot find elsewhere - even if I can manage to get the HTML into a database.

I have used the D6 version of this to import HTML, then upgraded that site to D7 to get the appropriate menu links and such, but I'm looking to do that in a second batch for the same site, and am not sure that the Migrate module will help me handle those menu links and legacy paths... at least not without a major learning curve.

stella’s picture

subscribe

mgifford’s picture

What are the plans for D7? Anyone able to contribute to this module's development? The Coder module can help move things ahead and get a patch which addresses many issues with the upgrade.

Tom Ash’s picture

+1 & Subscribe

mgifford’s picture

Title: Drupal 7 » Bring Import HTML to Drupal 7

Changing title, but also recommending folks look at http://drupal.org/project/migrate

johnbarclay’s picture

Feeds is also a good tool for importing one time or continuously synching content. It is basically a pluggable chain of fetcher (file, directory, ldap, http), parser (xml, querypath, csv, etc.) and processor (node, vocabulary, user, etc.). It includes mapping functionality to map parsed fields to target fields.

dman’s picture

The D7 roadmap I have is a lot closer to Feeds in many respects.
I couldn't fit my use-cases to migrate.module. Feeds is the most flexible for how I see a modular import system working

stephen ollman’s picture

DMAN, we have a number of very large scale projects coming up that are looking to migrate tens of thousands of pages into Drupal 7. It would be good if you were able to contact me to discuss your availability to assist in these projects we have, to further develop this module for D7. Happy to discuss rates and time frame off-line.

dman’s picture

Version: 6.x-1.x-dev » 7.x-0.x-dev

Thanks Stephen.
D7 branch has been made and stabilized at
http://drupal.org/node/1467774
DO NOT USE yet as it's just frameworks, and the actual serialization of data needs to begin now. But things are in motion.

Til Peter Noske’s picture

Lots Greetings to all of you.

Thanks a lot to Dman for doing this Import_Html module and your friendly response month ago.
That time I transported about 8000 static .htm sites to D6. This do have German - Umlaute.
There have been troubles.
So if you (#13 Stephen) will do something like that, you have to transform your date first into
unicode utf 8 stuff - ore something like that.
Mayby everybody knows this, except me.
After that and some trying out - (days) I got them in to Drupal 6 DB including umlauts.
And now I have upgraded to D7.12. Very fine.

@Dman : So a hint (- about unicode utf 8-) in your worthwhile directions could be useful .
If I haven't see it - forget it please.

Looking forward for Import_Html going to D7,
so I can try it directly.

Wish all of you a excellent time
cordially
Peter

dman’s picture

UTF8 etc has always been a problem for me and XML transforms. Most of that I tried to work around in various ways for earlier PHP (4) versions but some inconsistencies certainly remained. with better support in PHP5 I can probably discard some of those conversions.
I'll try to throw some special characters into my test cases, thanks for the warning Peter!

that0n3guy’s picture

I thought I would share my D7 work around. I have a pretty simple html site from a company intranet. I wanted to import it to D7 so I was going to use this in D6, then update to d7, then export nodes to my real site... But I wanted to do it locally and didn't feel like messing w/ the htmltidy and xml requirement. So this is what I did:

- I installed wordpress 3.1
- used http://wordpress.org/extend/plugins/import-html-pages/ to import my html site (very easy to do)
- (optional) use http://wordpress.org/extend/plugins/search-and-replace/ to change some linking structures (I could probably use this later: http://drupal.org/project/scanner)
- Use http://drupal.org/project/wordpress_migrate (could probably be done w/ feeds too)
- (optional) maybe use http://drupal.org/project/scanner to make some more tweaks
- (optional) I will probably make a quick module using querypath (http://drupal.org/project/querypath) to runthrough and blow away some tables as well.

All I needed was a basic install of wordpress. Its funny, Drupal is very powerful, but sometimes I find myself abusing wordpress to do Drupals bidding. Muahahaha (evil laugh).

oktay’s picture

Isn't there already a dev version for Drupal 7. Is the above note "DO NOT USE" still applicable? This submission -- especiall since it is still in ACTIVE state -- is confusing.

dman’s picture

At the date "March 4, 2012" above, the status was "do not use" when the dev branch first started.

The available release is dated 2012-Oct-29

Some improvements happened between those dates.

The difference between a dev branch and a stable release is pretty much:
Someone has tried out the dev branch and said it works for them.

If nobody outside of my test machine has tried it, it's not going to be tagged stable.

That aside, I've not encountered any problems with it since October.
But I've not worked with it since October either.

GAtherton’s picture

This module could be really useful for our old site migration from a php/mysql platform - specifically as it seems to rewrite links embedded in content to accommodate the new node structure??

Unfortunately in our hands we cannot get it to used php htmltidy extension or an uploaded executable (on a remote server).

When I try to import a single file using default settings and simplehtml template I get PDOException: in dblog_watchdog() (line 154 of /home2/bsmmorg/public_html/nacpatients.org.uk/modules/dblog/dblog.module)

Shall I give up on d7, install d6 locally and try an import (which I have had working in the past but it did not rewrite links at that time) on a more stable earlier build of import_html?

Thanks

Graham

dman’s picture

All versions will be requiring htmltidy, so it won't help to try a different version.

In the future I may be trading htmltidy for a library like querypath and Guzzle, but for now, it's 'tidy'

I can't tell what might be happening from that debug message, but I can say that teh current code has been working for a while, and should be rewriting links fine too. For me, but I'd like to know where you might be having trouble.
If you increase the debug level (in advanced settings) we could tell what step it's having trouble with.

GAtherton’s picture

I am working on this - will let you know when tested further

Thanks

dman’s picture

Issue summary: View changes
Status: Active » Closed (fixed)