I have relatively recently updated my Drupal 5.7 site to use Pathauto 5.x-2.1 and Token 5.x-1.10. Following this update I am now seeing an issue where strings in the "Strings to Remove" field of "General settings" in the Pathauto admin page for my site are not being removed.
The relevant elements of my configuration are:
General settings
Set to default values - specifically the word "the" is in the list of "Strings to Remove"
Node path settings
"Default path pattern" set to "[termpath-raw]/[title-raw]"
Category path settings
"Default path pattern" set to "[vocab-raw]/[catpath-raw]"
"Pattern for all Category paths" set to "[catpath-raw]"
I have a top-level term in the default Category vocabulary called "About The School" which previously had an alias of "about-school" as expected i.e. the "the" is being stripped from the alias. If I now save this term the alias is being updated to "about-the-school" i.e. the "the" is NOT being stripped from the alias any more.
Likewise, aliases for content assigned to this term are no longer having listed words removed from their aliases. For example, a page with title "About The Web Site" is now being given an alias of "about-the-school/about-the-web-site" instead of "about-school/about-web-site" as previously generated, and as expected.
Please help! Is this an issue or something relating to my possible misunderstanding of how the "-raw" tokens work? Many thanks.
| Comment | File | Size | Author |
|---|---|---|---|
| #36 | 231292_36_pathauto_words_to_replace_first_last.d6.patch | 729 bytes | Freso |
| #35 | 231292_35_pathauto_words_to_replace_first_last.patch | 1.11 KB | greggles |
| #32 | 231292_32_mb_ereg_replacing.patch | 877 bytes | greggles |
| #29 | 231292_29_mb_ereg_replacing.patch | 1.03 KB | greggles |
Comments
Comment #1
gregglesI followed these same steps and was unable to repeat the behavior. Can you try on a brand new site and see if the problem persists?
Note that this textfield should have all the words together separated by commas (a,an,the,whatever) without any spaces or anything.
Comment #2
darrenscott commentedThanks for the quick reply.
If you have tried these steps with no issues on a different site I'm sure it will work as expected on a new clean site. What I'd like to understand though is what it is about my particular site that was previously working absolutely fine with an earlier version of Pathauto which is causing it to experience problems after upgrading to this version of Pathauto.
As a couple of extra tests I've tried changing other settings, such as the separator character and these changes take effect when updating terms as expected. I've also tried replacing the "Strings to Remove" value with just "the,The" (I know case sensitivity shouldn't matter but just wanted to cover all bases) and this still doesn't result in the "The" of "About The School" being removed.
Can you think of any other steps I can try to get to the bottom of what is causing this on my site?
Also, is it worth trying the latest 5.x-2.x-dev version? (this is a live site though!)
Finally, as you've proved this version works on a different site would you like me to change the issue category to "support request" as it's most likely not a bug in the module?
Comment #3
gregglesI'm not sure of many more debugging steps to do aside from adding print statements into the actual code to see what is going on.
Comment #4
davegan commentedI can verify that I'm also experiencing this bug... is it possible that aggressive caching could have fouled this up?
Comment #5
davegan commentedok, I've locked down the behavior...
the issue occurs when you have capital words in the node title.
For instance:
"Here is a node title"
Will create the url "here-node-title".
However
"Here Is A Node Title"
Will create the url "here-is-a-node-title".
It's an easy bug to miss, as the second option is an improper title, according to the English language. The reason I caught it on my site is that the removal word "The" is at the start of many of my titles.
I've fixed the problem on my end for now with a couple of strtolower() calls in pathauto_cleanstring, but it would be good to see a module developer fix this up in the next version.
Comment #6
darrenscott commentedThanks for confirming I wasn't alone in experiencing this issue davegan - and good work tracking down the exact cause.
Would you mind posting exactly the changes you've made to pathauto_cleanstring to get this to work for you? Is it worth proposing these fixes as a patch?
Do any of the module developers know what changes in this version have broken this and are there plans to fix the module any time soon? Would be great to get an official fix in for this and a new version released...
Comment #7
greggles@davegan - can you please provide a patch based on your changes?
Now that there is a simple repeatable test case this should be easy to fix. Thanks.
Comment #8
darrenscott commentedExcellent news - many thanks.
Comment #9
gregglesI did some more testing with this and found this feature to be enormously inconsistent. Words in my "words to remove" list were sometimes left in even if they were lowercase. I'm starting to think this needs to be reworked a bit...
Comment #10
davegan commentedI could provide a patch, but I suspected there could be more problems that wouldn't be addressed with my quick fix. I'm a little tight for time at the moment to test and debug completely.
Previous versions of Pathauto seemed to be fine - perhaps the key would be going back and figuring out which change introduced the bug?
Comment #11
darrenscott commentedThanks davegan. From what greggles said earlier, it sounds like there could be more problems lurking with this...
Do you know if it's safe to go back to the previous version of the pathauto module? I don't think I've ever had to downgrade a module before! Would there be any special steps required to downgrade?
Greggles - do you have any update on any reworking or official patch for this?
Comment #12
greggles@darrenscott which previous version worked for you?
Comment #13
darrenscott commentedI can't be 100% sure but I think it was the previous version, 5.x-2.0 as I remember all modules I use being up-to-date until the point this latest version 5.x-2.1 was released.
Comment #14
gregglesWell, if you can pin down a version where it worked correctly and the new 5.x-2.1 where it doesn't it will be easy to figure out what changes were made that broke this which will make it enormously simple to fix :-)
Comment #15
darrenscott commentedAs I said, I think I was using 5.x-2.0 previously. Unfortunately I don't have time right now to confirm one way or the other but I will try to do so tomorrow.
I wonder if anyone else experiencing these problems can be more specific on versions - @davegan - how about you?
Assuming it was 5.x-2.0, can you see any obvious changes in 5.x-2.1 which would cause the problems?
Oh, BTW - thanks again for taking the time to look into this issue :-)
Comment #16
moonray commentedSame issues (subscribing)
Comment #17
moonray commentedI believe I might have found the cause of the problem: the mb_ereg_replace function.
This is from a comment on PHP.net:
Luckily there is a mb_eregi_replace function.
Comment #18
moonray commentedOne more thing about this, now that it works...
A path like "blogs/The Team" gets cleaned up to "blogs/-team". It would be nice to get rid of the separator at the beginning.
Comment #19
moonray commentedLooks like the separator remaining at the beginning only happens with a pattern like "blogs/[catpath-raw]".
Comment #20
darrenscott commented@moonray - when you say "now that it works...", is that by replacing references to mb_ereg_replace with mb_eregi_replace in your local pathauto.module?
Further to the original problems I reported, I've also been experiencing other strange problems with pathauto relating to the updating of existing aliases. I don't want to go too off-topic but am interested if anyone else has experienced the following:
I create a term for a vocabulary e.g. ABC
An alias of abc gets created for this term.
I then update the term from ABC to ABCD.
The term is updated correctly, but the alias appears to remain as abc.
I then update the term again from ABCD to ABCDE.
The term is updated correctly, but the alias gets updated to abcd! i.e. one update behind
I've tried this with both 5.x-2.1 and 5.x-1.0.
Has anyone else experienced this?
Comment #21
greggles@darrenscott - please, please open a new issue for a new problem.
@everyone else - let's stay focused. thanks.
Comment #22
darrenscott commented@greggles - sure, no problem, I'll raise another issue
Back to the original issue... do you have any update? Are we any closer to having a fix for it? Does @moonray's observations about mb_ereg_replace help?
Comment #23
gregglesI don't have time to do much development now, so I haven't tested it. I'm relying on Pathauto's users to develop patches and then tell me which patches make sense or not.
Comment #24
darrenscott commentedAs per moonray's post #17 above I've just applied this same change to my installation of pathauto 5.x-2.1 and can confirm the problem I initially reported appears to be resolved by it.
To summarise, I replaced the two references to mb_ereg_replace in pathauto.inc with mb_eregi_replace instead. (For the actual call to this function I used the same parameters as before but I guess the "/i" suffix in the first param is now redundant)
@greggles - would you have time to create a patch based on this change and test yourself? Thanks.
Comment #25
gregglesAs I said in #23 of this issue, no, I don't have time nor priority on these issues to roll nor test patches now. I will in a month. If you can roll and test a patch that helps me.
Comment #26
Anonymous (not verified) commentedsubscribing
I edited the function to use mb_eregi. Yet ...
With subsequent words to remove, the function will only remove the last listed in the settings.
Example:
Words to remove are: "dr,lic,phil,prof"
Page title is: "Prof. Dr. Albert Einstein"
Results in URL: "dr-albert-einstein"
Changing the order of words in the preferences to "prof,dr,lic,phil" will result in the URL being "prof-albert-einstein"
Comment #27
goldstone commentedWe also experienced the same issue/resolution identified in #17. Wherein certain words are not removed as per the PHP note. In addition - there is a separate issue described in #26 in which some of the words on the list are not being removed (besides those that are not removed due to the first issue). With a few tests I found that the first and last words in the list were not being removed. I didn't continue testing to see what other patterns would result in words not being removed nor under what conditions.
Example:
Words to remove: "a,the,of,test"
Page title: a test of the emergency broadcast system
Resulting URL Alias: a-test-emergency-broadcast-system
Example:
Words to remove: "a,a,the,of,test,test"
Page title: a test of the emergency broadcast system
Resulting URL Alias: emergency-broadcast-system
In the second example I doubled up on the first and last words as a work-around. I purposefully left the title lower case to separate the issues.
Comment #28
Leeteq commentedSubscribing.
Comment #29
gregglesOk, could folks please test this out? I have a feeling that most of the problems are caused by 1) subtle differences in php version/configuration/compilation 2) the fact that pathauto will attempt to use preg_replace sometimes and mb_ereg_replace in others which may be further affected by those subtle differences.
@george.goldstone and @xeophin - note that your experiences are similar, but completely the opposite. So...more fun!
Comment #30
WeRockYourWeb.com commentedJust tried the patch on 5.x-2.2 and it gave the message "Hunk #1 failed at 161 / 1 out of 1 hunk failed"
Duplicating the first term in the list of strings to remove solved the issue - ie. using "a,a,..." helped remove lower case a's. Just for kicks, I tried "a,A,a,..." and now it removes both lower and uppercase a's. It appears that 1) it ignores the first item in the list of strings to remove, and 2) it's case sensitive?
Comment #31
jenlampton@asterix - the version of this issue is 5.x-2.1, so that's where the patch should be applied, not 5.x-2.2
I am suffering from the same problem (in versions 2.1 and 2.2), words with capitals not being removed, as well as lower-case a (the first word in the list) not being replaced.
I've downgraded my site to 5.x-2.0 and wanted to confirm that all the above problems seem not to be problems in this version.
@greggles, I'm not sure how much help that is, but maybe it's something?
Jen
Comment #32
greggles@jenlampton - yes, that gives a great clue. It seems that this bug was introduced in #169250: transliteration failed with non-latin characters which should have used mb_eregi_replace instead.
There seems to be a related but separate issue about the first and last words in the list either not working or being the only ones that do work.
In general, patches should be against the tip of the branch, so for 5.x that is the tip of DRUPAL-5--2 but in this case it fell out of date with all the recent wonderful cleanup work of Freso. Updated patch attached.
Comment #33
gregglesI fixed the first part of this - that it was broken for capital words.
Now, to figure out what is wrong with the regex for the first/last words in the group.
Comment #34
gregglesSo, it turns out that mb_eregi_replace seems to take a different pattern style than preg_replace. I didn't see anything on the docs page to suggest this (http://ar2.php.net/preg_replace vs. http://ar2.php.net/preg_replace) but...at least in my tests this seems to work.
Since regex tends to be platform specific I'd love it if a couple of people could try this out. The change is very simple even if you are not comfortable applying patches: just change
Old:
New:
Comment #35
gregglesSomehow I forgot to attach the patch...
@george.goldstone, @xeophin, @jenlampton, can any of you test this new fix for the first/last words in the box?
Comment #36
Freso commentedTested and works on 6.x. A few comments though. :)
preg_replace("/$ignore_re/i", '', $output);still proper, or should this be fixed as well? (For cases where PHP isn't built with multibyte character support.)Comment #37
Freso commentedOh, and for platform info: I tested on Gentoo Linux (kernel: 2.6.24-gentoo-r8), using "PHP 5.2.6RC4-pl0-gentoo with Suhosin-Patch 0.9.6.2".
Comment #38
gregglesYour patch seems reasonable to me. AFAIK the preg_replace side works relatively well (it was the more tested way of doing it - the mb_ereg_replace was added more recently, I think between 5.x-2.1 and 5.x-2.2).
Comment #39
Freso commentedAlright, since we need to test this on multiple platforms (this being operating systems and PHP versions), I decided that I might as well go ahead and write how to test it.
Comment #40
moonray commentedAs I'd mentioned previously, this patch fixes the main issue for me (though, I wasn't having the first/last term not being replaced issue).
drupal 5.7, php 5.2.1, mac os x
Comment #41
Freso commentedOoh. I didn't notice it didn't catch the last word either. Well, I guess I'll have to test again (and update my test description). Oh well. :)
@moonray: The issue with capital letters' fix is already checked in, the patch is now only dealing with the first/last term replacement. If you haven't tested this, please do.
Comment #42
Freso commentedAlrighty! Committed to HEAD/6.x-1.x. Now we just need a back-port for 5.x-2.x. :)
Comment #43
Freso commentedAs the patch from #36 applies cleanly to 5.x-2.x as well, I've just committed it.
One more issue closer to 6.x-1.1 release! Now's an excellent time to go review #208860: Add per-language patterns for multilingual node types! :D
Comment #44
Anonymous (not verified) commentedAutomatically closed -- issue fixed for two weeks with no activity.