Posted by roderik on February 18, 2010 at 10:29pm
| Project: | Import HTML |
| Version: | 6.x-1.x-dev |
| Component: | Code |
| Category: | feature request |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | needs review |
Issue Summary
I don't know if you consider this fit for inclusion, but if not... maybe someone will benefit from the patch anyway.
I need do import HTML that has such badly/illegally placed start/end tags, that HTMLTidy 'fixed' it in the wrong way. (^%$$%$ MS Frontpage)
So I made a custom script to fix that, and modify the HTML string before this module feeds it to HTML Tidy.
The following patch adds a facility (and config option) for running such a script. I called it a 'pre-tidy command'.
| Attachment | Size |
|---|---|
| import_html_pretidy.diff | 6 KB |
Comments
#1
#2
Yeah, I've already found the need to put one or two pre-tidy fix ups - mostly to do with international characters, non-UTF8, and ... some other MS wierdness I can't recall
It would be good to abstract this into a clearer config option. I was thinking a series of regular expressions to run, but I guess lower level code could be the thing also. Ah, Drupal hooks! They are MADE to be preprocessors.
I'll see how we can work this or something similar in.
#3
I'm similarly coming to the conclusion that the pages I need to import are too broken for the parser or tidy to deal with. I'm going to look into a pre-processer as well.
#4
Just FYI: if anyone cares why I wanted to write my own script (and not regexps or so): Python Script To Tidy Up Ugly MS Frontpage HTML
(Not much directly to do with the patch...)
#5
Updated patch.
tempnam($_ENV['TEMP'], "htm")seems to have stopped working on my server, for some reason I don't care to find out.(I had taken this construct from tidy_functions.inc. Now replacing it with Drupal's temp dir.)
@dman/#2: a hook sounds awesome, to enable both regexps and a command-line script like mine (which I didn't want to code in a PHP hook, though I might have done that if I had experience with a BeautifulSoup-like PHP library)