Jump to:
| Project: | Import HTML |
| Version: | 6.x-1.x-dev |
| Component: | Code |
| Category: | bug report |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | active |
Issue Summary
I understand that something similar happened in 2006 but don't know if this is the same problem.
I have my site at /var/www/mysite and I want to import just the "articles" directory of that. My pages are in files like 6R12.html or 11U2.html
Ideally I want the URLs to be http://mysite.com/articles/11U2.html so that it matches the old site, but the system imports them as http://mysite.com/articles//11U2.html - or to be more accurate http://mysite.com/articles/%2F11U2.html
Here is a fragment from an import:
Importing 'articles//11U2.html'
Found a value for 'subtitle' to save as a CCK field value
Found a value for 'factuality' to save as a CCK field value
Found a value for 'author' to save as a CCK field value
Found a value for 'pggnumber' to save as a CCK field value
Found a value for 'pggindex' to save as a CCK field value
Found a value for 'pggdate' to save as a CCK field value
This document (known internally as 'node/963' ) should now be accessible via aliases as both 'articles//11U2' and 'articles//11U2.html'
Is this a problem in my configuration, a bug, or does it always do this?
I appreciate that I can try setting up a mod_rewrite rule to change "/%2F" into just "/" but that would be a really ugly work around.
PS (I got CCK field import working by applying a PHP 5.3 patch. THANKS again for this module)
Comments
#1
WORKAROUND
OK, I should have tested this first.
I was specifying mysitedir and subdir separately so that the import only did my subdir. If I skip the subdir and specify just the parent dir then it still goes into the subdir and imports the files, but gets the aliases right.
So - a bug, but a minor one
#2
Sounds like an ongoing inconsistency I've hit a few times.
Generally it's just an issue of ensuring that the path to the import directory and the URL alias pattern it produces either both do or both don't have a trailing slash.
There was some validation that tried to correct it for you, can't recall exactly how it did it.
In any case it probably can/should be tidied up with a post-process fix. double slashes are always a mistake.
Yeah, some extra massaging of the parameters could be done. It doesn't cover all the ways in which someone could accidentally tell it to do the wrong thing. URL rewrites are scary, I think I may start outsourcing to PEAR::URL to do the calculations for me from now on. Too many ways to get it wrong with my current regexps and string gluing.