Remove leading and trailing separators from strings and from around slashes
aaronshaf - October 27, 2008 - 01:52
| Project: | Pathauto |
| Version: | 6.x-2.x-dev |
| Component: | Code |
| Category: | bug report |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | needs review |
Description
For example, "A Response From John Doe" yields "-response-john-doe" instead of "response-john-doe"

#1
Marked #333518: remove hyphen from beginning of a token _after_ removal of "words to remove" as a duplicate.
I think this is a regression that we fixed somewhere and has now come back, but maybe not...
Maybe it only impacts the "*path" and "*alias" tokens?
#2
It's still a problem.
#3
How is this coming along? It's still a huge problem for a very large site I'm working on.
#4
When I work on things I assign the issue myself using the "Assigned" field. So, I'm not working on this. Perhaps you could fix it and provide a patch?
#5
Marked #346514: Filtered Words cause improper url as a duplicate.
#6
I've managed to track this issue down to using terms in aliases. It does not affect other tokens.
A cheap hack around it is to use the following in pathauto_cleanstring() right after replacing multiple separators with one:
//Replace all occurrences of /[separator] with a single forward slash$output = preg_replace("/\/$seppattern+/", "/", $output);
This is not a fix but a way around until a proper fix is found.
#7
@nonsie - why do you say this is not a proper fix?
EDIT: not a proper fix...
#8
Marked #462148: Seperator remains around slashes after stop words removed as a duplicate. It has a similar preg_replace solution in http://drupal.org/files/issues/pathauto-stopword-slash_1.patch
#9
Would the best solution not be to remove the separator from both the beginning and end of the string so we're also then not left with "sentences that begin with the" turning into "sentences-that-begin-with-".
#10
Yes, and since the separator is only one character, we should use the awesomeness of trim($string, $separator) instead of preg.
#11
So, list of scenarios that need to be covered:
#12
davereid: Given the variable can be controlled directly using variable_set() or $conf[] (in settings.php) to be anything the developer wants, we can't rely on just using the trim() function.
#13
#14
The current CVS already has the following line:
<?php// Trim any leading or trailing separators (note the need to
$output = preg_replace("/^$seppattern+|$seppattern+$/", '', $output);
?>
so half of our problem is already solved.
(note: the comment was chopped off by whoever committed the change)
I've rerolled and attached johnpitcairn's 6.x-1.x patch from #462148: Seperator remains around slashes after stop words removed.
#15
johnpitcairn's patch rerolled for v6.x-2.x.
#16
Committed to 6.x-1.x and 6.x-2.x
http://drupal.org/cvs?commit=277280
http://drupal.org/cvs?commit=277284
Thanks johnpitcairn, DamienMcKenna!
#17
How about some tests for this as well, eh?
#18
It seems like this introduced a new problem: #612232: Unknown modifier '|' in pathauto.inc on line 206.. Any chance someone can look into it?
#19
Could someone please provide an example 'pathauto_separator' value that causes this to happen?
#20
Maybe just try always escaping the separator?
Index: /home/dave/Projects/www/drupal6dev/sites/all/modules/pathauto/pathauto.inc
===================================================================
RCS file: /cvs/drupal-contrib/contributions/modules/pathauto/pathauto.inc,v
retrieving revision 1.54
diff -u -p -r1.54 pathauto.inc
--- pathauto.inc 22 Oct 2009 00:11:35 -0000 1.54
+++ pathauto.inc 23 Oct 2009 04:01:33 -0000
@@ -237,12 +237,9 @@ function pathauto_cleanstring($string, $
// In preparation for pattern matching,
// escape the separator if and only if it is not alphanumeric.
if (isset($separator)) {
- if (preg_match('/^[^'. PREG_CLASS_ALNUM .']+$/uD', $separator)) {
- $seppattern = $separator;
- }
- else {
- $seppattern = '\\'. $separator;
- }
+ // Escape the separator.
+ $seppattern = preg_quote($separator, '/');
+
// Trim any leading or trailing separators.
$output = preg_replace("/^$seppattern+|$seppattern+$/", '', $output);
#21
Marked #612786: warning: preg_replace() [function.preg-replace]: Unknown modifier '|' error as a duplicate of this.
#22
Helpful information from those of you experiencing this issue:
1. What kind of path patterns are you using for the items causing this issue?
2. What was the result generated alias from the items that experienced this issue?
3. What separator are you using (from Pathauto's general settings)?
#23
Dave,
Here are my answers for the problem I described in http://drupal.org/node/612786 (which I now see is a duplicate of some issues above!):
1. I only have three path patterns specified for the site: the default path pattern is "[title-raw]," one content type has "sometext[nid]," and another content type has "sometext[field_cckfield-raw]." The others are left blank. The error is occurring on the editing or creation of all content types, however.
2. URL aliases are not being created for any content types that result in this error (which is all of them!); content is created with the standard "node/nid" format
3. I am using nothing as the separator
Hope that helps and let me know if I can provide any further helpful information.
#24
Ok I can confirm this bug when using an empty separator. Patches for both 2.x and 1.x that checks if strlen($separator) so that separators like '0' can work, although that would be weird. Also, the convert-to-lower-case was inside the $separator block, so if there was an empty separator, that wouldn't be executed, so I moved it outside that block.
#25
Thanks for your investigation.
The changes seem good to me. Lowering priority for this, and I don't plan to make a new dot release just for this feature.
#26
Patches without the stupid comment this time.
#27
#28
Thanks for the quick fix!
The patch has eliminated this error, but a new problem has surfaced: auto aliasing is not working now for my content type that has the path "sometext[nid]". Could this be related to the patch, or should I investigate other possible problems and file a new issue if necessary?
#29
Since my "sometext[nid]" automatic path worked fine before this patch and stopped working after I applied this patch, I am going to assume it has something to do with it...
#30
Worked fine for me on the latest 6.x-2.x. Can you provide more details as to why it didn't work?
#31
Well, I'm not sure why it didn't work. To clarify, the patch in #26 DID work to eliminate the preg_replace() error, but now my "sometext[nid]" auto path does not work for some reason. This path DID work before I upgraded from 6.x-1.1 to 6.x-1.2.
#32
I've done a little more testing and tried reverting back to 6.x-1.1, and my "sometext[nid]" path is still not working, so I suspect it has nothing to do with this patch. I have a custom module doing some hook_nodeapi work on the content type in question, so maybe that (or a conflict with another contrib module) is the problem.
#33
Subscribing
#34
I was getting same error and Pathauto's replacement patterns not working but the patch worked. Thanks!