Add Wildcards to Redirect Paths

jjeff - October 11, 2007 - 02:37
Project:Path Redirect
Version:6.x-1.x-dev
Component:Code
Category:feature request
Priority:normal
Assigned:Unassigned
Status:needs work
Description

This patch adds support to enter a wildcard into the from path.

For instance, if you have pathauto set up to create image nodes at photos/[author-username]/[node-title], then you might want to create a redirect from:
photos/<*>
which would redirect to a View displaying a tab at
user/<*>/photos

I've chosen to use <*> as the wildcard because * by itself is a valid URL character, however < and > are not. Therefore, we're guaranteed that this will never conflict with a valid path.

Some points for discussion:

Right now we're limited to one wildcard and it has to be a distinct path segment (delimited by /). For instance, you can't redirect from user<*> or our_old_legacy/url/<*>.html. The reason for this is below.

For performance reasons, I've opted NOT to use the SQL "LIKE" selector to match the path. My feeling is that this would be too heavy of a query to impose on every page load (including those with page caching enabled). However this point is open for debate. Additionally, we'd kind of need to to the LIKE query in reverse, because the wildcards are in the database and we're matching to the known path value. However if we could get it to work (and we felt like it wasn't a performance problem), we could gain much more complex selectors.

Right now, the query is built by building an array of the possible path matches (substituting the wildcard in each segment position) and then using the IN selector to do an exact match on the (indexed) path column. So for instance, visiting node/3/edit builds this query:

SELECT path, redirect, query, fragment, type FROM path_redirect WHERE path IN ('node/3/edit','<*>/3/edit','node/<*>/edit','node/3/<*>') LIMIT 1

We're okay with "IN" on Postgre, right? And LIMIT?

Another nice bit is that you can use <*> not only in the normal to path, but you can also use it in the query or the fragment. So you could redirect user/<*> to userlist#<*> or userlist?user=<*>. Makes for some interesting possibilities.

AttachmentSize
path_redirect-wildcard.patch6.01 KB

#1

HorsePunchKid - October 11, 2007 - 16:23

Subscribing; very cool. I double-checked, and this worked fine in Postgres 8.1.9:

SELECT nid, title FROM {node} WHERE nid IN (3,1,4) LIMIT 2

#2

hawkdrupal - October 13, 2007 - 18:33

This improvement is MUCH-NEEDED.

But...

I applied the patch. But I can't save a redirection with <*> in the destination field. This error is displayed: "The redirect to path does not appear valid."

Checking the code, it seems to be lacking a way to allow this non-standard URL content (much as it allows ?).

#3

jjeff - October 16, 2007 - 12:09

Oop. Yes. I have a fix for this. Will upload soon.

-j

#4

deviantintegral - October 18, 2007 - 21:54

With a site I'm working on, I encountered a situation where we essentially need mod_rewrite with user permissions. My solution for this problem was to use preg_replace and to allow the user to put in a regular expression. This works really well because it allowed us to do more complex patterns such as:

Source URL -> Destination URL
categories/services/(.+) -> $1 // Allows us to redirect taxonomy pages to the service page, so we don't have to change any taxonomy URL's
(.+/courses/.+)/20[0-1][0-9]-[0-9][0-9]-[0-9][0-9] -> $1 // Redirects users from a specific course date to the course info page.

This presumably gets around the issues you mentioned above (every page load is just a SELECT src, dst...) and the heavy work is done in PHP's preg_replace.

The biggest problem with this approach is that ATM it's possible for an admin to break things by putting in a regex which won't compile. I suppose that could be fixed if PHP4 supports try{} catch{}, but I need to find that out.

Anyways, I've attached my module for reference in case you find anything that's useful. "As is" it works without conflicting with path_redirect, but I'd much rather integrate this functionality here :)

Thanks,
--Andrew

AttachmentSize
url_redirect.info 100 bytes

#5

deviantintegral - October 18, 2007 - 21:57

Bleh. I can't upload a tgz or upload multiple files apparently. Here's the rest of the module.

Or not. I apparently can't upload a .module either. Here's a link to the zip file: http://www.cs-club.org/~andrew/files/url_redirect-10182007.tar.gz

#6

cubbtech - November 6, 2007 - 22:31

I think the attached patch fixes this, but it's my first attempt, so go easy on me :)

Patch is against the 11.05.07 5.x-1.x-dev version.

AttachmentSize
path_redirect_wildcard_02.patch 6.17 KB

#7

HorsePunchKid - November 29, 2007 - 16:26
Version:5.x-1.1-beta1» 5.x-1.x-dev
Status:needs review» needs work

I'll reroll this against the latest 5.x dev version, hopefully incorporating hawkdrupal's related request.

#8

casey - December 7, 2007 - 10:43

I suggest to store wildcards in the database as spaces since their charactercode is the
lowest possible (32). Exclamation-marks (charactercode 33) is, according to
[RFC1738](http://www.faqs.org/rfcs/rfc1738.html) allowed in URLs. Spaces aren't
allowed and will be encoded as %20 or +. This means we can use spaces for
wildcards without problems. When using "ORDER path DESC" in queries wildcards
will come hindmost.

And when adding a weight to redirects you also can override this.

#9

LUTi - December 17, 2007 - 08:36

I've installed the latest 5.x-1.x-dev version (December 16th), and there is no visible differences in admin interface. Wildcards are also not accepted.

I've checked the code - it seems path_redirect-5.x-1.x_i18n_15.patch (which, as I understand, shall resolve also the Add Wildcards to Redirect Paths issue I am mainly concerned about) was not applied. I've applied it to this latest version, but there are still no noticeable changes (paths are displayed as before, altough some differencies are announced - to see URLs as saved in the database; if I enter an asterisk - as only * or as <*> - it is not accepted, only the error complain is displayed...).

I am also quite confused now about which patch shall resolve which issue (there are 3 or 4 "active" now). Maybe it would be good to more clearly define which issues will be resolved by which approach - or each issue separately with a patch provided there, or all together (if like mod_rewrite will be used to resolve also the Add Wildcards to Redirect Paths and/or Obscuring true paths of redirects is confusing and buggy issue(s))? Just for us to know which issues to follow looking for a solution needed (I find now mainly references to patches attached to some other issues, but from there more or less at the end only references to some other issues etc.).

So, about my problem - is there any code (patch or a new development version) ready, which shall enable the use of wildcards? If yes, what exactly shall be used (or, done to enable it)? If it shall already work, how exactly to enter the wildcard character (I would suggest to put this info into the "Enter a Drupal path or path alias to redirect" text displayed to add (or, edit) a redirect.

#10

HorsePunchKid - December 21, 2007 - 03:06
Status:needs work» needs review

Thanks for keeping this issue active, LUTi! Here is a patch against the very latest dev version, which as of this posting probably isn't quite available for download yet. (They're generated every 12 hours, I believe, so you should be able to get it soon!)

I haven't tested the patch extensively, but it appears to basically work. The only significant change I made from the previous patch was to have the "test" link in the admin interface still work by having it replace the wildcard with the string test.

AttachmentSize
path_redirect-5.x-1.x_wild_10.patch 5.72 KB

#11

HorsePunchKid - December 21, 2007 - 03:14

For what it's worth, my thinking is that 5.x-1.1 should come out very soon, but without the wildcard feature. I'm eager to get 1.1 out because Postgres is fixed in dev but still broken in beta1.

I have some changes I need to port to 6.x dev, too; I'm sure we'll have a stable release ready by the time D6 is out!

#12

LUTi - December 21, 2007 - 09:04

HorsePunchKid, thank you for a patch provided.

I've downloaded the latest version of module (1.3.2.16 2007/12/21) from CVS first, and applied the patch to it. When I try to use the wildcard <*> in "From:" only, but not in "To:" as well (to redirect all partially saved paths to a single page, informing the visitor that such URL doesn't exist...), such a redirect can not be saved - I get the message "The redirect to path does not appear valid." only.

So, I would suggest:
1. To allow many URLs to be redirected to a single URL only (= use of a wildcard in "From:" field only)
2. To make wildcards work as in filesystem paths (to respect other characters in a path segment and to have a single character wildcard as well)

I need a second feature for partially cut URLs (paths as http://.../image123.jpg, http://.../image234.jpg, http://.../image345.jpg etc. wrongly saved by some robot(s) as http://.../image123, http://.../image234, http://.../image345 etc.), which I would like to redirect to a single page with a message (mainly to prevent filling my logs with errors, but possibly also to inform the administrator over there about his errors). Of course I have to keep links to images - so in my example, I would need to redirect http://.../image???, but not http://.../*.

Or, to introduce another field with exceptions - redirect http://.../*, but not http://.../*.jpg and http://.../*.png and http://.../*.gif (or, at least http://.../*.*). Maybe to simply use regular expressions would be the best, but probably there would be quite a lot of users not familiar enough with them (I am the first one like that), so there should be a good help with typical examples practically a must.

I am aware I ask quite a lot (with the 2nd suggestion), but by my opinion wildcards would expand the possibilities (usability) of this module really significantly.

In any case, could you implement at least the 1st suggestion soon, please.

#13

yngens - January 3, 2008 - 04:01

i tried to apply patch #10 against today's dev version and it gives:

patching file path_redirect.module
Hunk #3 FAILED at 212.
Hunk #4 succeeded at 214 (offset -11 lines).
Hunk #5 FAILED at 298.
2 out of 5 hunks FAILED -- saving rejects to file path_redirect.module.rej
[root@host path_redirect]#

#14

HorsePunchKid - January 3, 2008 - 04:43
Status:needs review» needs work

Thank you for the reminder; I'll try to get this patch updated to apply to the 1.1 release within the next couple of days. If anybody else would like to help with this feature, have at it!

#15

HorsePunchKid - January 21, 2008 - 02:15
Status:needs work» needs review

Here is some grease for the squeaky wheel. :)

This should apply cleanly to the latest dev version and stands a good chance of applying to the 1.1 release, too. Get it while it's hot!

AttachmentSize
path_redirect-5.x-1.x_wild_15.patch 6.05 KB

#16

yngens - January 21, 2008 - 06:52

patch worked this time. but, unfortunately, did not solve my problem described also on http://drupal.org/node/195853#comment-677007

current patch makes it possible to redirect paths with only one single wildcard in the url. for example this works:

Drupal URL: 'articles/<*>'
Redirect path: 'http://oldversion.mysite.com/articles/<*>'

only in case if wildcard <*> replaces a single word.

But it does not work, if you try to replace path with several sections in between symbols '/'. For example, this kind of addresses from my pre-drupal site are not possible to redirect: http://oldversion.mysite.com/articles/2006/06/16/greenline

I also tried:

Drupal URL: 'articles/<*>/<*>/<*>/<*>/<*>'
Redirect patch' 'http://oldversion.mysite.com/<*>/<*>/<*>/<*>/<*>'

Unfortunately, it did not work.

I am looking forward to find a way to redirect thousands of old urls on my site like: http://www.mysite.com/articles/[year]/[month]/[day]/[shor_title] to http://old.mysite.com/articles/[year]/[month]/[day]/[shor_title]
Can you, please, advice what is the best way?

#17

HorsePunchKid - January 21, 2008 - 07:04

It sounds like you need mod_rewrite. I don't think this module will ever duplicate all of the functionality of mod_rewrite; I'm not sure what the point would be.

#18

NikLP - February 7, 2008 - 13:21

+1 subscribing, and waiting until this thread pans out into something simpler for my tiny mind to read :)

#19

Dimm - February 13, 2008 - 09:56

+1

#20

Stefan Vaduva - April 10, 2008 - 11:21

Any news about this patch? When will be applied? It was applied?

Thanks

#21

alexbiota - May 28, 2008 - 16:28

+1

#22

Xano - June 7, 2008 - 16:06

+1. I'd very very very much like to see this feature make it into the D6 version as well.

#23

cubbtech - June 16, 2008 - 18:13
Version:5.x-1.x-dev» 5.x-1.2
Category:feature request» bug report

Not sure if I should file this as a separate bug report, but this feature is not working for me in the 5.x-1.2 version (it has worked for me with various patches in the past). I can confirm that the wildcard code is present in path_redirect.module, but when I enter:

foo/<*>

in the from field and

bar/<*>

in the to field, I get the error message:

The redirect to path does not appear valid.

#24

Xano - June 16, 2008 - 19:53

I suggest using '%' for wildcards rather than '<*>' for more consistency with hook_menu().

#25

chromix - September 24, 2008 - 13:39
Category:bug report» feature request

I'd like to resurrect this issue. I could really use this feature on a site I'm working on, but I'd rather not downgrade to the old version to use this patch. Can someone give a status update? Is this feature going to be included in a future version?

#26

geodaniel - October 30, 2008 - 02:43

Marked #291504: Redirect wildcard and #312708: "Templated" redirection as a dup of this issue. Existing code from #306475: Regular Expression Path Matching may also be useful.

#27

drein - January 12, 2009 - 08:46

please.....
Could someone post the latest working module for drupal5 with wildcards support? it is good also if it works partially.
I attemped to patch the 5.1.2 but code is changed, obviously.
Thanks in advance.

#28

gmenzel - January 12, 2009 - 14:59

subscribing

#29

fumbling - January 29, 2009 - 05:10

Is there a patch for D6?

#30

Dave Reid - January 29, 2009 - 06:31
Version:5.x-1.2» 6.x-1.x-dev
Assigned to:jjeff» Anonymous
Status:needs review» needs work

Not yet. If this gets in, it will be accepted for 6.x-1.x only. I don't use Drupal 5 myself anymore so I don't have motivation to fix anything in 5.x besides bug fixes. If someone provides a patch that backports the change from 6.x-1.x, I'll review it.

#31

fp - February 27, 2009 - 22:16

Hey -

I needed a quick fix for the D6 version and - doh - I didn't look at the issue queue before patching it myself against 6.x-1.0-beta1.

So, here's what I have done. I hope it doesn't confuse the readers of this thread.

I promise to have a look at merging/working in the code/ideas expressed above shortly...

Cheers
fp

AttachmentSize
path_redirect-182512-6.x-1.x.patch 5.99 KB

#32

k74 - March 27, 2009 - 13:31

There is a patch for the beta 3??

#33

fp - March 27, 2009 - 22:26

beta 1

#34

kjv1611 - April 21, 2009 - 12:27

This thought would be great for me as well. Though it's not a big deal for me yet, so not worth patching my copy, but I'm excited to see that this may be in future releases! Yay!

#35

design.er - April 24, 2009 - 15:57

Hey, this is great.

Is it possible to implement token support?
Somehing like redirect from user/profile/[realname] to user/[realname] would be great.
Case-study: I use core.profile for birthday.module functionality and content profile because of cck fields, taxonomy etc. I merged both in one content profile and would like to redirect automatically from core.profile to content profile.
That would be great!

Regards,
Stefan

#36

ferrangil - May 27, 2009 - 15:27

Subscribing.

#37

nicholas.alipaz - May 29, 2009 - 18:06

Just subscribing and adding my two cents.

I think this would be great to have as a feature. I would prefer regular expressions over wildcards personally. That mirrors a bit better the typical way of doing it with the .htaccess. Additionally allowing users to specify weights in some situations may be useful. That way someone could override one particular rewrite being done through a regular expression with one that is done through a written out path.

#38

comicsonline - June 8, 2009 - 06:17

I'm VERY interested in this functionality as well. I currently run Drupal 6.12 but I'm coming over from an old PHPNuke site. according to my site traffic stats, I'm losing a ton of traffic when people are clicking on links to URLs that were previously valid on the PHPNuke site before I moved to Drupal when my old ISP suddenly made changes that broke my old site.

Anyway, I've put up the corpse of my old site as well, but instead of:
http://www.whatever

it's now located at:
http://old.whatever

To give a more specific example, I want to redirect incoming requests for:
http://www.comicsonline.com/modules.php?[wildcard]

to:
http://old.comicsonline.com/modules.php?[wildcard]

so that:
http://www.comicsonline.com/modules.php?name=News&file=article&sid=1205
and
http://www.comicsonline.com/modules.php?name=Forums&file=viewtopic&t=60

redirect to:
http://old.comicsonline.com/modules.php?name=News&file=article&sid=1205
and
http://old.comicsonline.com/modules.php?name=Forums&file=viewtopic&t=60

...respectively. Will fp's module at http://drupal.org/files/issues/path_redirect-182512-6.x-1.x.patch do this or will I need to do more to it or wait for that functionality to be implemented?

#39

NikLP - June 8, 2009 - 10:59

I'm half-guessing, but from first glance it looks like you should be able to do this with mod_rewrite rules in htaccess - indeed it may be more appropriate to do it that way. Find a rewrite expert and get a second opinion.

#40

zorroposada - July 23, 2009 - 00:27

I created a patched module based on latest dev version.

Files patched:
path_redirect.intall
path_redirect.module
path_redirect.admin.inc

See attached:

AttachmentSize
path_redirect_wildcard.zip 42.44 KB

#41

vt.dave - July 29, 2009 - 20:38

I 2nd the Perl style regular expressions!! That would provide me total control over the URI's coming into drupal as well as prevent me from mucking with the .htaccess file.

Thanks!

#42

ronn abueg - October 1, 2009 - 23:39

Here is a patch that accepts wildcards with minimal changes to the module by utilizing REGEXP (mysql).

If the admin adds the redirect, from=drupal* and to=http://drupal.org*, a user will be directed like so...
q=drupal/node/182512 to http://drupal.org/node/182512
q=drupalnode to http://drupal.orgnode
q=drupal to http://drupal.org

If the admin adds the redirect, from=drupal* and to=http://drupal.org, a user will be directed like so...
q=drupal/node/182512 to http://drupal.org
q=drupalnode to http://drupal.org
q=drupal to http://drupal.org

If the admin adds the redirect, from=drupal and to=http://drupal.org, then it is the default behavior.

Tested in Drupal 6.14 and is in a live site with no problem.

We initially have the redirections in the .htaccess as rewrite rules (see below), but it was getting a pain to manage, etc.

RewriteRule ^drupal(.*)$ http://drupal.org$1  [R=302,L]

AttachmentSize
path_redirect_regex.patch 824 bytes

#43

LUTi - October 2, 2009 - 10:34

ronn,

thanks for sharing the code.

Unfortunately, it doesn't seems to work for my case, where I would need to redirect from:
http://<mywebsite>/catalogue/current*
to
http://<mywebsite>/catalogue/2009/english*
where * shall cover also everything after a ? as for example URL:
http://<mywebsite>/catalogue/current?item=100&colour=1
(should be redirected to: http://<mywebsite>/catalogue/2009/english?item=100&colour=1)

I've tried to put an asterisk into To field or into a ? field (destination), without a success.

#44

ronn abueg - October 2, 2009 - 17:28

I did not take into account the query strings, only paths - it did not even execute the changes because of the check for query strings.

Anyway, this patch should work to do the above, but you may have to revert path_redirect.module back before applying this patch.

Then, you just set the following url redirect:

from=catalogue/current*
to=http://site.com/catalogue/2009/english*
?=
#=

Let me know if you still have problems.

AttachmentSize
path_redirect_regex.1.patch 1.46 KB

#45

ronn abueg - October 2, 2009 - 20:42

Attached has an important tweak in the regxp query. So please use this one instead of the previous two.

AttachmentSize
path_redirect_regex.2.patch 1.48 KB

#46

LUTi - October 5, 2009 - 08:35

Sorry to report, that (after applying #45), it still doesn't quite work.

The first issue I am noticing is that instead of (URLs are entered as that):
from=catalogue/current*
to=catalogue/2009/english*

the following is saved in Drupal (URLs are listed like that later):
from=catalogue/current*
to=catalogue/2009/english%2A

(but, don't know if it has anything to do with the issue below).

And, while URL: http://site.com/catalogue/current seems to work (redirected to http://site.com/catalogue/2009/english),
as soon as there should be replacements, I get "Page not found" ERROR instead; probably because URL:
http://site.com/catalogue/current?item=100

is converted into:
http://site.com/catalogue/2009/english%3Fitem%3D100

instead of:
http://site.com/catalogue/2009/english?item=100

#47

ronn abueg - October 5, 2009 - 18:31

The query (item=100&...) was being pass to drupal goto function incorrectly. Sorry about that.

This patch should fix that.

AttachmentSize
path_redirect_regex.3.patch 1.85 KB

#48

LUTi - October 6, 2009 - 06:50

Thanks, ronn. Works like a charm now.

#49

LUTi - October 6, 2009 - 07:08

ronn,

maybe a couple of "cosmetic" issues should be fixed before consider your patch as just great (at least from my point of view, I've really needed that functionality):

1. In the list of redirects, source "From" URL is listed OK (catalogue/current*) while "To" URL still contains %2A instead of asterisk (catalogue/2009/english%2A); not only that it doesn't look nice and is a bit of confusing, it should probably be fixed also as a basis for issue Nr. 2 (below)

2. Clicking to a "From" (as well as "To"...) URL in the list of redirects doesn't work well (as URL contains asterisk, and such page of course doesn't exist...) - probably it would be a good idea to simply cut away the asterisk from the URL if there is nothing after it (right to it)?

Everything is functional also as it is now, but that would make this solution really complete.

At the same moment, I suggest to include this patch (functionality) in the new development version (and, the next final version, of course).

#50

ronn abueg - October 6, 2009 - 23:54

Thanks for the comment. I was thinking of updating the admin panel a bit too but didn't have time.

Anyway, here is the patch to tweak the admin panel the way you suggested; the patch also includes path_redirect_regex.3.patch changes in it as well.

AttachmentSize
path_redirect_regex.4.patch 3.07 KB

#51

LUTi - October 7, 2009 - 06:45

I'm glad to confirm that it seems to work perfectly now. Thanks again, ronn.

#52

nick johnson - October 22, 2009 - 20:25
Status:needs work» needs review

Working for me so far, as well, on 6.14.

Testing on pre-prod site.

Great job!

#53

mannkind - October 27, 2009 - 00:03

Love the patch from #50. Hoping this gets integrated into the module itself.

#54

Dave Reid - October 27, 2009 - 00:39
Status:needs review» needs work

Sorry, that query is not validate with SQL-99 and does not work on PostgreSQL.
http://developer.mimer.com/validator/index.htm

#55

LUTi - October 28, 2009 - 13:10

Which query, Dave? I can't find any query in the patch from #50...

#56

Dave Reid - October 28, 2009 - 13:53

@LUTi: There's not a direct query since it's actually performed in path_redirect_load(), but the query that is generated by the patch contains (path like '%%*' AND '%s' REGEXP CONCAT('^', path))) which is not cross-DB compatible.

#57

stacysimpson - November 2, 2009 - 20:55

Subscribing

#58

mfer - November 5, 2009 - 12:58

I believe the specific issue is the 'REGEXP CONCAT'. Concat is not supported in Postgresql and pattern matching works differently. To keep this style there will need to be a check against the global $db_type to see if it's mysql/mysqli or postgresql and create the appropriate queries.

@Dave Reid - Do you know what the correct part of the query for postgresql would be?

#59

mfer - November 5, 2009 - 13:07

I think this needs something like:

switch ($GLOBALS['db_type']) {
    case 'pgsql':
      // Postgresql where clause goes here.
      break;

    default:
      $where[] = "(path = '%s' OR (path like '%%*' AND '%s' REGEXP CONCAT('^',path)))";
      break;
  }

#60

alanburke - November 9, 2009 - 10:23

Subscribe - useful feature

 
 

Drupal is a registered trademark of Dries Buytaert.