Posted by spherman on June 11, 2010 at 2:39pm
4 followers
Jump to:
| Project: | Search and Replace Scanner |
| Version: | 6.x-1.0 |
| Component: | Code |
| Category: | bug report |
| Priority: | critical |
| Assigned: | Unassigned |
| Status: | postponed (maintainer needs more info) |
Issue Summary
Thanks for contributing this module, but it caused a headache for me.
When it runs, it replaced the text ok, but each node it touched, it removed the alias that I assigned for the URL.
This makes it unusable.
Comments
#1
Thanks for the bug report. I'll test this out. Can anyone else confirm that doing a search-and-replace removes aliases?
#2
@spherman: Yes, Scanner touches every node it edits, so pathauto will generate new aliases - according to the actual pathauto definitions and the actual node title (btw, nodes touched by Scanner are also added for re-indexing through Drupal core's search). That "touching" of nodes causes often unexpected results, e.g. when pathauto uses tokens or the pathauto definitions have changed in the meantime. Similar effects occur when the URL alias was edited manually, but somehow didn't maintain it's status (manuel/automatical); there are some issues here, but I'm not sure if they're related to Scanner or Drupal aliasing. However, I haven't noticed that aliases are removed (meaning: deleted from the database without being replaced).
How do you generate your aliases? Are you using pathauto or something similar? Can you provide a example procedure to reproduce this phenomenon?
I think this needs more information, changing status.
#3
Confirmed (and in my opinion this is a critical bug that quite simply means that for me, and anyone who has any non-automatic URL aliases on their site, the module is unusable). I tested a search and replace on a string contained in 1 page, to which I had assigned a manual url alias. It caused aliasing to be automatic for that page again and assigned a new url alias using pathauto. This "feature" has the potential to break an entire site's navigation.
#4
@alanpeart: "This "feature" has the potential to break an entire site's navigation." - Indeed, I just encountered this on a site with 40,000 nodes, where almost every URL aliases changed, causing lots of internal links links to break, and killing almost every URL indexed by Google. Obviously this is more than annoying.
However, I'm still absolutely not convinced that the 'scanner' module does something wrong here, and I doubt that this module even has code to remove aliases, like this issue's title suggests. I believe that this is one of the pitfalls of using pathauto. It simply becomes obvious because of the sheer amount of changed aliases when running large bulk operations.
My observations about this:
If believe that this issues lies a lot deeper. E.g., most of us try to catch changed aliases with the 'path redirect' module. This module is supposed to handle exactly cases like this - lots of aliases are being changed, but old aliases are to remain working (with something like an HTTP response code 301 that redirects to the new alias). However, all bulk operations - be it with 'scanner' or VBO - are not cought by path redirect. So obviously this issue is much too complex to quickly point fingers.
To make this situation even worse than it already is, development on 'pathauto' is stale for years. The architecturally broken legacy branch 1.x is still the only release tagged as "supported" and "recommended". Most of us are still running 1.x 6.x-1.5 from 2010-Okt-07 (I'm running this on the 40k node site as well). The current branch 2.x fixes lots of problems of the legacy version, but introduces numerous new issues and thusly lives in development limbo for years without a stable release (I had to downgrade several sites back to the 1.x branch because of severe issues with 2.x).
Judging from my experience with 'pathauto' and 'scanner' over several years on more than a dozen of D6 sites I'd bet that 'scanner' simply triggers a misbehaviour of pathauto. Please feel free to prove me wrong by documenting a procedure where 'scanner' removes URL aliases without having 'pathauto' enabled ;)
PS: And btw, elevating this issue to "critical" will probably not result in anyone listening as the 'scanner' module is de facto abandoned for several months.