Jump to:
| Project: | Import HTML |
| Version: | 6.x-1.x-dev |
| Component: | Code |
| Category: | feature request |
| Priority: | major |
| Assigned: | dman |
| Status: | active |
Issue Summary
I am importing a site that makes frequent use of abbreviations in the file names and paths. It is highly desirable these abbreviations appear in the menus and paths in an automated way. This is not possible because the abbreviations use the period character to show it is an abbreviation and in the imported files the paths are being truncated at the first period. In file "rewrite_href_and_src" where the paths are rewritten I can see this is a known issue because there is the comment:<!-- currently broken if path starts with (or contains?) "." -->
And I can understand this is due to the limitations of the string functions available in the XSLTProcessor class.
But wouldn't be better to dispense with doing this by xsl and do it in PHP instead so the same function used to for links in the menus, _import_html_calc_path(), can be used? If this function is used for both purposes (menu and content) linking in the content would be far more reliable.
Comments
#1
With the benefit of hindsight, I think you may have a good suggestion there.
At the time the XSL templates were developed, it seemed (to me) like XSL could solve all our problems. If you've looked at that rewrite file, you'll have some idea of the complexity of the things we need to do.
However ... although I still believe XSL is several times better than string processes (regular expressions, search & replace) for accurately managing XHTML safely, it's several times worse at doing string processes.
Since that process was first written all those years ago, I've found that a better compromise is to use XML for what it is good at, but keep using PHP for string manipulations.
I would be open to replacing the rewrite_links XSL with a more PHP-style function, though (as you can tell by the complexity that arose from so many different site architectures and exceptions) it would take some testing.
My roadmap for that would be a process I've used elsewhere - still use XML to read in the document and identify and src attributes - but use string processes to rewrite them, followed by XML set_attribute to put the new values back.
But skip XSL for that particular phase :-}
#2
PS.
At the time the rewrite XSL was written, XML and XPath support within PHP4 was abysmal, namespaces were unsupported, and trying to do a half-XML, half-PHP solution was not an option. It was either all string-matches (which I no longer trust not to break XHTML) or all XSL - which was harder than it should have been, but at least was stable.
With PHP5, now I've deprecated PHP4 - a new, less insane path can be taken.
#3
I believe I have found a way to solve this with minimal change to your existing code for PHP versions 5.04 and greater:
http://www.php.net/manual/en/xsltprocessor.registerphpfunctions.php
Now we can have regular expressions for processing strings.
This offers a quick solution until it can be all be done in PHP + XML.
#4
I like it. I'm tagging this for attention. Would mean a scary rewrite of scary things.
I'm currently at the end of a project and don't want to break tested stuff, so not just yet.
.. but after that I'll look at refactoring the URL-rewrites into php not XSL.
I like this approach, so I'd be interested in trying it out. Thanks for the pointer.
#5
I have actually tried to do it this way. I could not make it work. So I had to do it using DOMDocument class methods.
#6
Yes, I'd been planning this since before you brought up this exact approach.
I'm actually most happy plotting a way forward by discarding the rewrite_href_and_src.xsl entirely, and placing all the rewrites in PHP as a loop of :
<?php$links = xml_query($datadoc, './/*[@href]');
foreach ($links as $link) {
$rel_link = xml_getAttribute($link, 'href');
/* do magic */
$link->setAttribute($fixed_link)
}
?>
Should have done this from the start