Comments

dman’s picture

I don't personally have time to give this a go yet. Real world, real $.
I've been meaning to rewrite the file selection screen info form API (it's still doing it the 4.something hard-code way) and that's a bit messy.
OTOH, I have a better API for import-export I've been working on for a complementary 'static mirror' module - which does the opposite of import_html.

Try offering me a commission next month and something may be possible.
Not this week however!

dman’s picture

I'm actually now keen to try this. The time is right and I'd like to help us move forward to 6.x as I see things are really dragging re module migration. I'd love to see us maintainers start picking up the slack.

BUT

I'm personally in a big-time contract. Although time can be stolen here and there, it's not a sensible idea for me to be pulling time towards this. I don't personally need it this week, and it'd be several days dedicated.
If anyone is interested in raising a bounty for this, I'll get it done, but we are talking 1000EU to even make it a go. Plus custom tweaks. Sorry, but that's current economics for my hours. Just throwing it out there.

OTOH, I'm welcome to anyone else joining the project and just damn doing it any time or for any rate they feel like undercutting my quote at :-). I'm not posting to solicit a job, I'm posting to solicit co-operation.

I'll give CVS and kudos to anyone who can get the current directory select/import form behavior to work natively under D6 :-). It's currently still D4 style (pre-FAPI) really.

sam6’s picture

subscribing

Flying Drupalist’s picture

Subscribing. Hopefully someone will get some time, sometime.

kenorb’s picture

Any other similar modules for D6.x?

wundo’s picture

I'm working on the migration of Import HTML 5.x-1.2 to D6, I didn't touched the FAPI mess, but it's sorta working here.

dman’s picture

HEAD has a complete rewrite of the FAPI into proper multistep forms, including proper FAPI theming of that treeview that was hard-coded back in Drupal4, and the parsing routine all broken out into more semantic and modular data absorbtion methods. For D5, but using D6 programming style.
I got distracted at some point when trying to abstract it so that import routines could be automated and run on-the-fly by a spider instead of through the UI.
But the HEAD code structure is the direction I'm headed with that.

adam_b’s picture

subscribe

silentway’s picture

Subscribing. This would really help attract new Drupal users, and let them jump straight to D6. There are tons of sites out there that have been sitting out the first eras of CMS.

kenorb’s picture

Any dumps?

cjdavis’s picture

subscribe

dman’s picture

There is a dump - it's on a dev machine I am away from at the moment. Yeah, it could be remoted into but there was a power cut while I was away and ... well, anyway. Next week if all goes well.

hikarateboy’s picture

subscribe

ilfelice’s picture

subscribe

Ian Douglas’s picture

This would be a great service to the Drupal community. If I had the money it would be already in your pocket. All I can offer is my sincere appreciation for your efforts. Good luck, and thanks.

cjdavis’s picture

Have you made it home yet? This module would be huge for people new to Drupal.

mdowsett’s picture

subscribing

WhiplashInfo’s picture

Subscribing!

dman’s picture

OK,
I am in a budget position right now and need to be spending time on paying work.
But it's been queued in the pipeline for a while now and should be done, so I can steal enough time away from real work to get this up for HALF PRICE.
If this is useful enough to you, please consider hurrying things along with an encouragement via ChipIn. It's easily $1000 worth of work to get it to a good state. But if enough people together think it's worth $500, I'll dedicate a few days to it.

(if you object to a hard-working developer suggesting his time is worth more than $0.00 per hour, please just ignore this message)
.dan.

dman’s picture

Priority: Normal » Critical

Thanks to Zelavi for getting the ball rolling with this bounty drive!

As a step towards making things really happen at my end, I've now spent half a day merging the various dev branches into something workable, refactoring the code hugely (as planned but not quite executed until now), creating a Drupal-6 branch to move forward on, and also spent (way too much) time on getting all the code up to 100% compliant deadwood and coder.module standards. (still doesn't actually work again yet) [DO NOT USE]

Everyone else out there, let me know this is something worth spending more time on! I want it to happen, but I'm frankly 6 grand in overdue debt today. Too much free time lost helping in the forums I guess.
I'm not holding any code hostage, and would honestly love it if someone joined in to take some of the weight off this (very worthwhile) project. But cash incentive is also very much appreciated! And WILL make things happen.
.dan.

keva’s picture

you're very welcome. Thanks for all your hard work and regular assistance to Drupal users.

This module is potentially a HUGE time-saver.

iamnoskcaj’s picture

Looks interesting -- could be very useful...

Subscribing

MPLS Marketing’s picture

Priority: Critical » Normal

I imagine this module is still looking for some love. My company would be interested in sponsoring this module for Drupal 6 if we could iron out some details. Please contact me via PM.

activebiz’s picture

Dan,

am strapped myself for cash this month but can contribute next month. Nothing huge, but some.
Am very interested in this module -as someone mentioned - this is a HUGE time saver.
I am stuck with a rather large static website I need to convert. And right now, I stare with horror at that table based html site thinking of all the copying and pasting.

I know there must be many out there that must be in the same boat.

Myself, am not good enough to help with coding, but can help with financial contribution as of May 1st.

Hope, this get's moving soon, because when version 7 comes out, we may be again in the same boat.

I for sure would be eternally gratefulfor this mod for v.6x :)

silentway’s picture

Hi everyone-

I just thought I'd let you all know that the ChipIn support nudge really works. And to give a shout-out to dman, who dug right in and moved this forward after my small ChipIn donation. Try it!

http://coders.chipin.com/import-html-conversion-to-drupal6

The next related topic might be the HTMLTidy module for D6... seems it might not be ready for D6 as advertised?

http://drupal.org/project/issues/htmltidy?status=All

http://drupal.org/project/htmltidy

dman’s picture

Yeah, I got a nudge, and spent a few more hours (well, all night really) pulling the UI and file selector into D6 multistep forms.
The process will now use Drupal 'batch processing' so that I can handle hundreds of files without timeout.
And lots of code tidy-up. I'm removing the PHP4 XML support - just not worth the extra library I had to use. Also able to convert a few utility functions to ones that have been added to Drupal6.

Still no code worth actually running - the 'import' proces itself is next. Good news is that that part I know inside out, and does NOT have too much dependency on D6 API changes. It was the UI/FAPI upgrade that required learning & trial & error.

... having been up all night, I'm now hours late for my real job... could be my responses to chipin donations are disproportionate :-}

.dan.

dman’s picture

Version: 5.x-1.x-dev » master
Assigned: Unassigned » dman

FYI to all subscribed.
I've got a little momentum on this and some inquiries (though not heaps of commitment) sponsorship-wise.
I did another half day over the weekend, sorted out the menu creation a bit (totally rewritten in core for D6 ) and assorted other bits smoothed out.
The UI and import process now works again, but only with hand-holding and needs a lot of revision to start picking up actual data and handling problems. CCK support to be revisited soon, but I'll be getting the wizard and UI stable first, then we could actually call it an alpha.

Do feel free to either Chip In Now or wait around for a few more weeks for me to get spare time and inspiration.

dman’s picture

Priority: Normal » Critical
StatusFileSize
new1.3 MB

OK, I want to get this done ...
After another few half days there is a DEV DRUPAL-6 in CVS. (Bits were in CVS for the last fortnight, but even less stable)

I've now been able to try out the menu builder, the semantic metadata extractor, and yes, CCK works as before! CCK W00T!
I've run through a handful of test cases from different sites, and it's all going like it should.

SilentWay asked nicely, so I checked out an import on his site - which came out like I hoped (from my end anyway). Here's a bit of documentation if you want to read about it.
(Also remember to see the built in help docs - in the help/ folder or via advanced help and the Drupal UI)

There are still some flaky bits - mostly with the UI and the multistep. It's not polished or totally wizard-like yet, but can perform all the expected jobs when in the hands of a careful driver!
If you grab the dev version, it may at times require the devel.module if I left some debug in there.

Contributions are still being solicited to put this into a stable, shiny release. This may include looking at your site as a test case and tuning up an extraction template just for your special needs. Be in quick.

eluhrs’s picture

I just want to let all the people tracking this thread know that I chipped in some money to help Dan finish this module. Actually, my main point is to encourage others to do the same. It's a good module and will save any user tons of time. Please consider contributing a little money, via ChipIn, if you are able.

dman’s picture

StatusFileSize
new13.57 KB

Yes, Thanks Eric for your feedback through testing, and especially the tangible contribution.

OK folks, I'm starting to feel OK about the current DEV version in CVS.
I revised the batch operations to allow for chained, recursive folder listings, as the UI couldn't really stand tens of thousands of checkboxes.
A few days ago I ran CCK through its paces, and damned it it didn't work!

So I'll say you can have a go at that thing if you want to cut yourself on the bleeding edge. The Multistep UI is a bit flaky (use the next/prev buttons, not the browser ones or refresh), but the engine is solid once it's actually tuned up.

See the chapter on naming your fields to fit with CCK in the big help document

(feedback, assistance with docs - even if it's just to say "this bit isn't clear") is also welcome.

We are, well, almost halfway to budget :-} ... and I seriously neglected my social life this weekend for this...

dman’s picture

Version: master » 6.x-1.x-dev

OK folks, I'm on a last little sprint to get this ready to go.
We've worked through a test case (with some much-appreciated support & patience from Eric) that has churned few a few thousand pages of archive.
There are still some scaling issues - my explanatory notes were just too damn verbose and slowing things down - and I'll be moving the log entries into the watchdog rather than the screen messages.
But I think we've got the UI, selectors working as needed.

If anyone wants to join the party, you can try out the dev version. both feedback and incentives would be appreciated. You'll get some help tuning it for your use-case too.
Remember to test it on a handful of pages before just clicking 'everything' and do it on a backed-up test site so you can roll back. Deleting hundreds of pages that are not quite right is boring. Better to snapshot your DB before pressing the big button.

jgarbe’s picture

I'm going to be using this to move over my hyperfiction projects to drupal...amounts to a couple hundred pages. My case is extremely simple as they are already reduced to labeled divs, but as soon as I figure out how to not re-write urls I'll move it over and test out the new version.

You sir, are a godsend.

jgarbe’s picture

Priority: Critical » Normal

Actually I have a question if anyone can point me in the right direction. If I'm trying to import a bunch of html flat files as children of a book (pages in a book) where would I look to accomplish that?

Thanks!

timoti’s picture

just wondering where this is at. AM happy to make a small $$ contribution, but the chipin has closed. in the meantime, thanks for the initiative on this

dman’s picture

If anyone wants to join the party, you can try out the dev version. both feedback and incentives would be appreciated. You'll get some help tuning it for your use-case too.

The -dev version has been working OK in a couple of trials. If a few more folk say it's doing the job for them I can make it a 1.0 release.
it works in the cases I've tried. But may need tweaking for cases I haven't tried.
Please give it a go and tell me what's not right for you.

rhuntley’s picture

Do I post detailed feedback here, or is there a way to send you private mail? I've been trying it but have a few issues it seems.

...Rob

dman’s picture

@rhuntley - I don't mind if you open a new issue in the queue here.
This thread is about done

mwoodwar’s picture

I left this in the old queue, but came across this one later...sorry if it duplicates. Anyway, I managed to get xslt and tidy installed on the server...but got the following error, and have no idea where to look...any suggestions?

Fatal error: Call to a member function cleanrepair() on a non-object in /home/cfbusine/public_html/sites/all/modules/import_html/coders_php_library/tidy-functions.inc on line 94

dman’s picture

Leave this question in its own thread

asb’s picture

Hi,

I played a bit with the -dev release for D6, but it still seems to have those strange problems with non-ascii characters; e.g. "Ausrüstung" becomes "Ausrüstung". That is fine as long you allow "full html" as input format; if you're running something like MediaWiki markup with PEAR Wiki Filter, that leaves hours of fun of fixing. Also, Drupal seems to be unable to interpret those encodings in the title.

However, even if "Import HTML" currently is of limited use for non-English users, the D6 version feels a bit more robust and solid than the D5 version. Could I somehow convince you to fix these encoding issues. or is this something you don't want to touch?

Thanks & greetings,
-asb

dman’s picture

#452536: Problems working with non-english languages
Short answer I just don't know how to deal with non-ASCII characters through multiple XSL transformations. They seem to get resolved every time, and thus produce non-UTF8 input for the next step. Encoding them numerically was the only way I could manage to get 'correct' XHTML output that would display on a browser - even though that's almost impossible to edit in plaintext.
Technical XSL suggestions for a solution in the linked thread please.
It could be that PHP XML handling has improved in the last few years, so maybe the solutions I failed to get working the first time now exist or are properly supported. We are PHP5 only now...

jsgammato’s picture

StatusFileSize
new66.68 KB

This is an awesome module!
My usecase: I generate all my company's technical documentation, including online help.
(I generate my source content in Adobe FrameMaker, then convert it to HTML help with Mif2Go. Along the way I break each chapter into individual HTML pages at H2s, so each procedure gets a page of its own.)
I want to take my HTML Help and import each individual procedure into a Drupal book page on my unofficial TechPubs intranet. The goal is for our Support guys to be able to see the latest released help for our product, and to be able to add comments to individual procedures that they can all see while fielding a support call.
After just a day of messing about, this thing works almost perfectly. There are a few little things that I am sure I'll be able to figure out.
But there is one thing I could use some guidance on, and one thing that looks like a bug.
Guidance: I can manually separate my HTML pages into "chapters" and then import them fine as book pages. Is there a way to ALSO auto-create a book structure if I can build some meta-tags into the content? For example, my appliance guide has 8 chapters that come in properly, but I want them to map to Drupal book pages starting with an Appliance Guide top page, then 8 child pages for the "chapters, and then more child pages for the individual HTML pages.
Bug: The generated book page is too wide (I guess) for my Garland layout; it gets pushed to the bottom of the page, and left-justified. See attached file.
(IE 6, WinXP)

dman’s picture

Provision for making many pages out of one doc has been built in - but not widely used, tested, or even documented up front.
Basically, you need to mess with the import template in such a way that it will produce:

<xml>

    <xt:document>
      <html>
        <head>
          <meta name="path" value="chapter1/page1" />
        </head>
        <body>.....</body>
      </html>
    </xt:document>

    <xt:document>
      <html>
        <head>
          <meta name="path" value="chapter1/page2" />
        </head>
        <body>.....</body>
      </html>
    </xt:document>

</xml>

As for 'book' - I don't use that much myself, but just had a look at its data structure. I'm not sure of a way to make child pages attach to their parents unless I know the parents ID. We'd have to do string matching of some sort which can be unreliable, especially with autogenerated content.
If solved, it can be plugged in OK, but the algorithm to find the appropriate parent would have to be worked on. It would look a lot like the code that currently knits pages into the menu tree. (I think book.module is now a special version of menu.module?) The real problem is that each insertion is independent of the other, and not every page has it's parent available at all times.

Your width issue is most likely an effect of the HTML including a [div style="width:800px"] or something. Cant's tell without view-source. You'll need to massage that out in the XSL.

Please open another issue for book support if you think it's worth following up. It's not on my timeline at the moment...

jsgammato’s picture

Done

(I expect I can do the wiring by hand for now, as a proof of concept, and then I wouldn't need it under pressure until Q1 of 2010.)

dman’s picture

Status: Active » Closed (fixed)