Let me start by saying that I have enjoyed Drupal immensely, and want to be able to give back to the project in some way. With that in mind, I was wondering if anyone would be interested in my next "bright" idea.

I want the ability to substitute *parts* of the url.

The path module has a great start on this, but it is lacking in that the aliases must match exactly. Pathauto is nice, but while you may have gotten to the page by going to www.example.com/my_vacation_photos, when you want to edit the page, it goes to www.example.com/node/476/edit.

For my site, I wanted to use the word "member" instead of "user" for the user pages, but I did not want to hack the user module to do so, and having hundreds of aliases from pathauto did not seem right, either.

That being said, I now have a modified path.inc file that does just that. Only one function needed to be changed, and I actually think it simplifies the code.

Now, if your website is in Italian, instead of the word "add" in the URL, you can use "aggiungere", or whatever else you want. Instead of "node" appearing everywhere, now you can use "data" or "info" or something else meaningful to your particular website.

Now, my question to the developers is: Is this something that you are interested in?

Comments

court-jus’s picture

I'm french and I would be really fine if I could change the words used in the URL as you say, yes !

coreyp_1’s picture

Sorry, but I'm horrible at making patch files, so I'm just going to post the function here. As always, please back up your files before you change anything.

In includes/path.inc, replace drupal_lookup_path() with this:

function drupal_lookup_path($action, $path = '') {
  static $mapsrc = array();
  static $mapdst = array();
  static $revmapsrc = array();
  static $revmapdst = array();
  static $count = NULL;

  if ($count === NULL) {
    $count = db_result(db_query('SELECT COUNT(pid) FROM {url_alias}'));
  }

  if ($action == 'wipe') {
    $mapsrc = array();
    $mapdst = array();
  }
  elseif ($count > 0 && $path != '') {
    if ($mapsrc == array()) {
      $query = "SELECT src, dst FROM {url_alias} ORDER BY CHAR_LENGTH(src) DESC";
      $resource = db_query($query);
      while ($result = db_fetch_array($resource)){
        $mapsrc[] = $result['src'];
        $mapdst[] = $result['dst'];
      }
      $revmapsrc = array_reverse($mapsrc, true);
      $revmapdst = array_reverse($mapdst, true);
    }
    if ($action == 'alias') {
      return str_replace($mapsrc, $mapdst, $path);
    }
    elseif ($action == 'source') {
      return str_replace($revmapdst, $revmapsrc, $path);
    }
  }

  return FALSE;
}

This worked fine in all my tests, but please let me know if you have any issues with it.

Please note that it is up to you not to create any naming conflicts. Think of it as a list of Drupal "reserved words". Don't make anything translate to "comment" or "reply" or "add" or any other word that normally occurs in the Path. It might create unexpected changes.

I have never used i18n, so I don't know if the two could be incorporated.

Good luck, and feedback (good and bad) is appreciated!

- Corey

nathandigriz’s picture

A bit of instruction on how you are making changes. If I want to use "content" rather than "node" where would I put the string? In path.module settings? or pathauto..... Confused as to how this would work:(

coreyp_1’s picture

once the function is replaced, all you have to do is add a path alias under "admin/path/add" (administer >> url aliases >> add alias). In your case, the system path will be "node", and the alternative path will be "content".

you actually don't need pathauto for this to work. Pathauto automatically creates aliases based on the node title (for example), so if you have a blog post entitled "bad day", you can access it through www.example.com/bad_day. The nice thing about my change is that, when you are editing this post, then the URL will still use the pathauto alias (i.e., you will see www.example.com/bad_day/edit instead of the node form www.example.com/node/46/edit). This is not possible under the current system, but easily possible when using my modification.

Before, I disliked Pathauto because of the inconsistencies in the links (when editing, etc.). Now, I love it, because it works well for every situation.

Hope this helps.

- Corey

coreyp_1’s picture

The Acidfree module uses the word "contents" in some of its URLs, so aliasing "node" to "content" would create an ambiguity that could hurt the Acidfree module.

A quick fix would be to, instead of "node" being replaced by "content", use "node/" and "content/". Addind the slash will keep the URL containing "contents" from being changed.

- Corey

t3r0’s picture

Hi,

Here's a patch against the latest HEAD from cvs...

Index: includes/path.inc
===================================================================
RCS file: /cvs/drupal/drupal/includes/path.inc,v
retrieving revision 1.4
diff -u -r1.4 path.inc
--- includes/path.inc	24 Apr 2006 19:25:37 -0000	1.4
+++ includes/path.inc	3 Jun 2006 10:31:33 -0000
@@ -39,7 +39,10 @@
  *   found.
  */
 function drupal_lookup_path($action, $path = '') {
-  static $map = array();
+  static $mapsrc = array();
+  static $mapdst = array();
+  static $revmapsrc = array();
+  static $revmapdst = array();
   static $count = NULL;
 
   if ($count === NULL) {
@@ -47,31 +50,25 @@
   }
 
   if ($action == 'wipe') {
-    $map = array();
+    $mapsrc = array();
+    $mapdst = array();
   }
   elseif ($count > 0 && $path != '') {
-    if ($action == 'alias') {
-      if (isset($map[$path])) {
-        return $map[$path];
-      }
-      if ($alias = db_result(db_query("SELECT dst FROM {url_alias} WHERE src = '%s'", $path))) {
-        $map[$path] = $alias;
-        return $alias;
-      }
-      else {
-        $map[$path] = $path;
+    if ($mapsrc == array()) {
+      $query = "SELECT src, dst FROM {url_alias} ORDER BY CHAR_LENGTH(src) DESC";
+      $resource = db_query($query);
+      while ($result = db_fetch_array($resource)){
+        $mapsrc[] = $result['src'];
+        $mapdst[] = $result['dst'];
       }
+      $revmapsrc = array_reverse($mapsrc, true);
+      $revmapdst = array_reverse($mapdst, true);
+    }
+    if ($action == 'alias') {
+      return str_replace($mapsrc, $mapdst, $path);
     }
     elseif ($action == 'source') {
-      if ($alias = array_search($path, $map)) {
-        return $alias;
-      }
-      if (!isset($map[$path])) {
-        if ($src = db_result(db_query("SELECT src FROM {url_alias} WHERE dst = '%s'", $path))) {
-          $map[$src] = $path;
-          return $src;
-        }
-      }
+      return str_replace($revmapdst, $revmapsrc, $path);
     }
   }
 

Quickly tested this and it seems to work perfectly!!

I'll do some more testing later tonight...

-Tero

coreyp_1’s picture

please see my response to t3r0 below

- Corey

sKanD’s picture

The french part says yes by my side too!

Do you think it could be combined with i18n ?
--
@+
sKanD
aka Stephane Carpentier

drupal777’s picture

absolutely. This would be wonderful. Member beats User hands down.

Tommy Sundstrom’s picture

An alternative way of doing this is to make RewriteRules in the htaccess file. That way you don't have to change anything in the drupal core files.

coreyp_1’s picture

I thought about just using htaccess, but then I realized that this would only work on links I create, but all system-created links ("node/35", etc.) would be unchanged.

By hacking the path module, all links are translated before being shown to the user as well. Therefore, if you replace the word "node" with "stuff", then you will never see the word "node" again in a link created with the l() function, which is just about every link you will see. "node/35" becomes "stuff/35". "node/35/edit" becomes "stuff/35/edit". You get the idea...

- Corey

nathandigriz’s picture

I would like to see this. Especially if it can be used with i18n for multi-language sights. I don't think htaccess is accomodating enough and not everyone has htaccess knowledge.

marlowx’s picture

so i could replace "user" with "dude" yeah i like that dude...

t3r0’s picture

Sounds really usefull!!

Please post the patch already :)

- Tero

Max Bell’s picture

+1

Neat idea!

t3r0’s picture

Anyone tested this with alot of url_aliases ?? (like 50 000+)

Specially the addition:

    if ($mapsrc == array()) {
      $query = "SELECT src, dst FROM {url_alias} ORDER BY CHAR_LENGTH(src) DESC";
      $resource = db_query($query);
      while ($result = db_fetch_array($resource)){
        $mapsrc[] = $result['src'];
        $mapdst[] = $result['dst'];
      }
      $revmapsrc = array_reverse($mapsrc, true);
      $revmapdst = array_reverse($mapdst, true);
    }


looks like it's going to slow things down when there are alot of aliases in the DB table...

- Tero

coreyp_1’s picture

Since my sites are small, this is not an issue, but I know it could be, especially with the next site I am building, so I have another solution.

Instead of hacking core, I am using killes' suggestion, in his comment below, of using the custom_url_rewrite() function.

Put this function into a new file, for example, named includes/custom_url_rewrite.inc. Here is the text of that file:

function custom_url_rewrite($action, $path, $original) {
  static $map = array();
  static $count = NULL;

  if ($count === NULL) {
    $count = db_result(db_query('SELECT COUNT(pid) FROM {url_alias}'));
  }

  if ($count > 0 && $path != '') {
    if (isset($map[$path])){
      $path = $map[$path];
    }
    else {
      $old = '';
      $new = $path;
      while ($old != $new){
        $old = $new;
        if ($action == 'alias') {
          $query = 'SELECT src AS pfrom, dst AS pto FROM {url_alias} WHERE LOCATE(src, "'.$new.'") ORDER BY CHAR_LENGTH(src) DESC';
        }
        else {
          $query = 'SELECT dst AS pfrom, src AS pto FROM {url_alias} WHERE LOCATE(dst, "'.$new.'") ORDER BY CHAR_LENGTH(src) ASC';
        }
        $resource = db_query($query);
        while ($result = db_fetch_array($resource)){
          $new = str_replace($result['pfrom'], $result['pto'], $new);
        }
      }
      if ($action == 'alias'){
        $map[$path] = $new;
      }
      else {
        $map[$new] = $path;
      }
      $path = $new;
    }
  }

  return $path;
}

You can leave off the ?> at the end of the file, but I had to include it here so that everything would show up appropriately.

Now, at the end of your settings.php file, add this:

require_once './includes/custom_url_rewrite.inc';

Your new URL renaming is now ready to roll!

Now for the gory details:

You CAN NOT put this into a module. I tried for 3 hours... The problem is that the URL is decoded *before* any modules are included, and since Drupal probably can't decode your new fancy URL's, it gives a lot of "Page Not Found" errors.

I got tired of adding aliases to my tables when I reached the 30,000 mark. Here's the long and the short of it:

  1. There will be a longer load time, because you are doing more processing of the URL, etc.
  2. I tested with around 30,000 aliases on the "admin/menu" page, because it contained 261 calls to the custom_url_rewrite() function (on my installation, that is).
    • Without the modification, the page loaded in .5~.6 seconds.
    • With the modification, the page loaded in .7~.8 seconds.
    • In short, it took about .2 seconds longer to make those 261 function calls with 30,000 records
  3. On other pages, with only 40 or 50 links (function calls), the time difference was negligible (again, with 30,000 URL aliases).
  4. It seems that the number of links on the page is of more importance than the number of URL aliases. When testing with only a few thousand aliases, I was getting similar time differences on the "admin/menu" page.
  5. This function was written to use caching, to streamline identical requests, which happens A LOT more than I thought it did.

Any comments/suggestions? I am looking for ways to make this better. I will be implementing this on my own site when I get it finished... (if ever!) :o/

- Corey

t3r0’s picture

Really nice to see this happen without core modifications :p , never noticed that kind of function call in path.inc... :)

But I must say the handeling of the custom_url_rewrite() in path.inc is a bit strange... Or causing alot of unnecessary calls to drupal_lookup_path() because it calls the drupal_lookup_path() everytime before checking if the custom_url_rewrite() function exists.. :O

for example, in path.inc:

function drupal_get_path_alias($path) {
  $result = $path;
  if ($alias = drupal_lookup_path('alias', $path)) {
    $result = $alias;
  }
  if (function_exists('custom_url_rewrite')) {
    $result = custom_url_rewrite('alias', $result, $path);
  }
  return $result;
}

AND

function drupal_get_normal_path($path) {
  $result = $path;
  if ($src = drupal_lookup_path('source', $path)) {
    $result = $src;
  }
  if (function_exists('custom_url_rewrite')) {
    $result = custom_url_rewrite('source', $result, $path);
  }
  return $result;
}

Both functions call the drupal's drupal_lookup_path() everytime and then if custom_url_rewrite() is available, then override the the $result with value returned from custom_url_rewrite() ... So needless to say, this adds always few unneeded ms'es to page generation time if the custom_url_rewrite() function is there....

I was able to drop 200 - 500 ms from page generation times when the custom_url_rewrite is used with this patch to path.inc

Index: includes/path.inc
===================================================================
RCS file: /cvs/drupal/drupal/includes/path.inc,v
retrieving revision 1.4
diff -u -r1.4 path.inc
--- includes/path.inc 24 Apr 2006 19:25:37 -0000  1.4
+++ includes/path.inc 4 Jun 2006 11:28:16 -0000
@@ -90,12 +90,14 @@
  */
 function drupal_get_path_alias($path) {
   $result = $path;
-  if ($alias = drupal_lookup_path('alias', $path)) {
-    $result = $alias;
-  }
+
   if (function_exists('custom_url_rewrite')) {
     $result = custom_url_rewrite('alias', $result, $path);
+
+  } elseif ($alias = drupal_lookup_path('alias', $path)) {
+      $result = $alias;
   }
+
   return $result;
 }

@@ -111,12 +113,14 @@
  */
 function drupal_get_normal_path($path) {
   $result = $path;
-  if ($src = drupal_lookup_path('source', $path)) {
-    $result = $src;
-  }
+
   if (function_exists('custom_url_rewrite')) {
     $result = custom_url_rewrite('source', $result, $path);
+
+  }  elseif ($src = drupal_lookup_path('source', $path)) {
+    $result = $src;
   }
+
   return $result;
 }

opinions / comments about this?

- Tero

coreyp_1’s picture

I wish the functions were structured so that the call to drupal_lookup_path() did not waste precious processing resources. If custom_url_rewrite() exists, then it should be the programmer's responsibility to call drupal_lookup_path() if needed. In the case of this function, it's not needed. Oh, well.

- Corey

coreyp_1’s picture

I decided to add addslashes() to the database calls, so that the function will be more robust.

Here is the updated function:

function custom_url_rewrite($action, $path, $original) {
  static $map = array();
  static $count = NULL;

  if ($count === NULL) {
    $count = db_result(db_query('SELECT COUNT(pid) FROM {url_alias}'));
  }

  if ($count > 0 && $path != '') {
    if (isset($map[$path])){
      $path = $map[$path];
    }
    else {
      $old = '';
      $new = $path;
      while ($old != $new){
        $old = $new;
        if ($action == 'alias') {
          $query = 'SELECT src AS pfrom, dst AS pto FROM {url_alias} WHERE LOCATE(src, "'.addslashes($new).'") ORDER BY CHAR_LENGTH(src) DESC';
        }
        else {
          $query = 'SELECT dst AS pfrom, src AS pto FROM {url_alias} WHERE LOCATE(dst, "'.addslashes($new).'") ORDER BY CHAR_LENGTH(src) ASC';
        }
        $resource = db_query($query);
        while ($result = db_fetch_array($resource)){
          $new = str_replace($result['pfrom'], $result['pto'], $new);
        }
      }
      if ($action == 'alias'){
        $map[$path] = $new;
      }
      else {
        $map[$new] = $path;
      }
      $path = $new;
    }
  }

  return $path;
}

- Corey

killes@www.drop.org’s picture

You will want to look at the custum_url_rewrite function.
--
Drupal services
My Drupal services

coreyp_1’s picture

Didn't know about that function, but it suddenly made a lot of things easier.

I'm always learning... ;o)

- Corey

court-jus’s picture

I installed your custom_url_rewrite include, modified settings.php and installed pathauto.

It works well, it's really a big stuff for end users, thank you very much !

James Andres’s picture

Hi coreyp_1,

This is an interesting thread going here, and a very critical issue to us at Project Opus.

Your solution is a nearly perfect fit for our "custom address" module which allows users to have their own customized url (eg: www.projectopus.com/james instead of www.projectopus.com/user/64).

The main concerns I have are performance. I've wondered many times if the drupal url_alias system is scalable to sizes useful for really big sites. For instance, if we hit 1,000,000 users (pie in the sky ;-) that would be 18,000,000 alias rows with my code (ouch!!) and 1,000,000 with your code. Remember that user/2 has to be different than user/3 .. etc.

I've been toying with an idea for a fix for a while now (I'm not a 'big participator' in the Drupal community so feel free to point out if this has already been suggested before).

My thinking is that the url_alias table has to take the form of tree at some point. This would kill 2 birds with one stone so to speak because it would allow for both renaming of partial urls and would 'hopefully' boost performance dramatically (in some cases).

Diagram here

The idea here would be that each "node" in the graph would recieve an id and be indexed into a path_index table. Then a path alias would be nothing more specifying which path_id to substitue an index for.

For instance:

If user had the path_id of one (1) and 64 had the id of (2) if we wanted to have user/64 be aliased to James we would have the following alias: 1,2 --> 'james'. This allows for all kinds of fun like having the drupal lookup function explode the path and trace the tree recursivly as follows:

q=user/64/foo/bar/otherstuff

(user -> 64) James -> foo -> bar -> otherstuff

Gotta run! I'll post back with a more finished thought tommorow.

James

Lead Developer on Project Opus
www.projectopus.com

coreyp_1’s picture

Why would your code have 18 million aliases? Wow, that's a lot!!!

I think I see what you're saying, but you're still stuck with the problem of having to have an additional 1 million aliases (one for each user).

My solution uses pattern substitution, and does not stop at the backslash character (a feature). Therefore, if you want to alias "de/" to "I_AM_AN_ALIAS", then "node/14" becomes "noI_AM_AN_ALIAS14", and Drupal understands it perfectly (although it might look a little strange to the user).

The reason that I give this example is because, if you alias "64" as "James", then when a person is trying to view "node/64", then they will see "node/James", regardless of whether or not James is involved. :o) That is why, in the current system (helped out by the pathauto module), "user/64" is aliased as "James", so that only the user page requests are aliased. (I know that you are probably fully aware of this, but I include it just in case there are some that are unclear as to the nature of this modification.) I know that your method would deal with this in a different manner, but, with the current method, it still only requires a few SQL queries (usually only 2) to substitute everything in the URL.

OK, I guess I need to stop rambling and get to the point:

I am afraid that the tree based aliasing would actually increase server load, because one would need to load the hierarchy, and then do the substitution for each "leaf" (recursively) in the url, resulting in significantly higher computation times and sql calls. Obviously I haven't built the code and tested it, but it would be my first concern.

What It all boils down to is that, in both designs, there must still be an alias for each option. The difference is that the tree design has a higher overhead of computation in terms of organization. MYSQL is fast in searching through rows of data, but if there is additional overhead of PHP having to pull from the database, understand the data, then go back to the database, etc., then I think it will impact the performance negatively.

Please forgive me if I misunderstood your approach.

One of the features of my approach is that it is beneficial for language translation of a site. Consider the urls "node/add", "admin/block/add", and "admin/access/rules/add". In each of these, the current method could replace "add" with whatever alias the author would like. In the hierarchial design, it would be necessary to create many aliases (one under "node" in the first level, one under "block" in the second level, and one under "rules" in the third level).

Here's an additional thought for you: I'm working on .htaccess to allow the url to be re-mapped so that not only clean urls work, but so that james.example.com becomes www.example.com/user/james, and james.example.com/edit becomes www.example.com/user/james/edit, etc. I think this would look very professional! The hard part is that custom_url_rewrite() does not include example.com or anything before it...

- Corey

James Andres’s picture

Hi Corey,

Thanks for the reply. I see your point very much. That issue is something I was mulling over for a while.

The main problem being, as I see it, is that tree structures are efficient when there a large ammount of data with 2 or 3 leafs per node (well more is okay too, but I'll stick with that as an example) but get less efficient when there are a very large number of leafs per node. For instance, node/1, node/2, node/3, ... etc, requires all thoes leaves to be scanned in order to find the correct match.

Oh, and the reason I have around 18 aliases for each user is due to the problem you just solved (ie: user/1, user/1/edit, user/1/profile, user/1/bio, ..... and on and on). Your patch would definitly help in this area!

Anyway, what still worries me is the general overhead of the aliasing--0.5 second -> 0.7 second page load times is a big deal :-S. I'm not tied to a tree design of course but I'm still thinking of ways to break up the data load.

For instance, although it's nasty, it would likely speed up search times a great deal by having a url_alias_nodes table (only aliases for node*) and a url_alias_user. One could even go as far as url_alias_node_1 (nodes/1*, nodes/2*, etc..).

A bit more rambling ;-), I haven't actually done the benchmarking (but I plan to) to see which queries and lines of code are the bottle necks. I'll get that done and I'll see if I can port your patch to 4.6 (what we use on Project Opus).

Thought, one use for a tree structure would be having each term as a "key" (ie: like a search key) would be that node/64 and user/64 would both reference the same key. This would cut down on the ammount of data some, but probably not a lot because users, nodes, taxonomy terms, and only a few other things use numbers in that way. Of course this principle would apply to every other term as well, so node/node would be self referential. Wierd.

James

Lead Developer on Project Opus
www.projectopus.com

James Andres’s picture

Hi Corey,

I'm back to bug you! ;-)

So I just took my first crack at getting your patch going with our 4.6 system. Everything appears to integrate well (the custom_url_rewrite check is in the drupal_get_path, etc..) and there doesn't seem to be too many differences between 4.6 and 4.7 in the url_alias / path area.

I left your function the same as you wrote it since it appears to be quite similar to the already existant drupal_lookup_path function.

A strange issue that quickly appeard was some sort of an infinite loop the patch caused :-S. Basically my test site sits there loading and doesn't stop (mysql usage goes through the roof as well). I wrote a little test script to check if the custom_url_rewrite function was even working (I called it test.php):

  require_once 'includes/custom_url_rewrite.inc';
  include_once 'includes/bootstrap.inc';

  $sql = "select * from url_alias limit 0, 1000";
  $result = db_query($sql);
  
  $start = microtime(true);
  while ($row = db_fetch_object($result)) {
    $crap = custom_url_rewrite('alias', $row->src, $row->src);
    unset($crap);
  }
  
  echo number_format((microtime(true) - $start) / 1000, 5);

This runs correctly, so it's not custom_url_rewrite that is hanging, not on the alias action anyway. This tells me that it takes around 0.06 and 0.002 seconds to run each custom_url_rewrite. This isn't terrible, so I'll assume everything is cool here. Keep in mind I'm only testing the 'alias' action, I'm not really sure what other actions are available...

So, the question is: why can I run THIS script but my site doesn't load (well it sits there loading and doesn't stop).

Last hint I have is that mysql just sits there on the sql query "SELECT src AS pfrom, dst AS pto FROM {url_alias} WHERE LOCATE(src, "'.addslashes($new).'") ORDER BY CHAR_LENGTH(src) DESC". Mysql claims it is "sorting rows" or something similar.

Have you experienced anything like this before?

I'll keep hacking at it and let ya know when I have some updates.

James

Lead Developer on Project Opus
www.projectopus.com

coreyp_1’s picture

a couple of thoughts:

  1. you mentioned drupal_get_path, but that is not used in the url aliasing. It is drupal_get_path_alias() in bootstrap.inc and drupal_get_normal_path() in common.inc. This may have been a typo, so I'll let it slide for now... ;o)
  2. 4.6 actually looks for a function called conf_url_rewrite(), not custom_url_rewrite(). If you want to keep it in the form 4.7 looks for (to make it easier to upgrade later), then you will have to hack core. Then again, I was going to suggest that next.
  3. I suggest replacing the two functions below:
    function drupal_get_normal_path($path) {
      return custom_url_rewrite($path, 'source');
    }
    
    function drupal_get_path_alias($path) {
      return conf_url_rewrite($path, 'alias');
    }
    

    you could put these two functions in with your custom_url_rewrite.ini code, and then, to prevent duplicate function names, add a _ (underscore) before the function names in the core files. This makes it easy to revert (and see your changes for future upgrading).

Didn't test it, but hopefully it should get you started.

- Corey

James Andres’s picture

Duh, forgot to tweak the common.inc also. And yes it was a slip, I was looking at drupal_get_path_alias and saying drupal_get_path ;-).

Our code does have custom_url_rewrite in it though (not conf_url_rewrite). This may be do to the url_alias "patch" that was going around a while back to speed up 4.6.x systems.

I'll see if changing anything in the drupal_get_normal_path or drupal_get_path_alias functions will help....

--- edit ---

Ahh, I've pinpointed the problem. Running the following code (notice I'm only doing 3 url aliases) runs the queries listed (truncated .. there is way more than what is listed).

  $sql = "select * from url_alias limit 0, 3";
  $result = db_query($sql);
  
  $start = microtime(true);
  while ($row = db_fetch_object($result)) {
    $crap = custom_url_rewrite('source', $row->dst, $row->dst);
    unset($crap);
  }
  
  echo number_format((microtime(true) - $start) / 3, 5);

Some recursion going out of hand ..

..... snip .....

query: SELECT dst AS pfrom, src AS pto FROM url_alias WHERE LOCATE(dst, "trip_search") ORDER BY CHAR_LENGTH(src) ASC
error:

query: SELECT dst AS pfrom, src AS pto FROM url_alias WHERE LOCATE(dst, "trip_trip_search") ORDER BY CHAR_LENGTH(src) ASC
error:

query: SELECT dst AS pfrom, src AS pto FROM url_alias WHERE LOCATE(dst, "trip_trip_trip_search") ORDER BY CHAR_LENGTH(src) ASC
error:

query: SELECT dst AS pfrom, src AS pto FROM url_alias WHERE LOCATE(dst, "trip_trip_trip_trip_search") ORDER BY CHAR_LENGTH(src) ASC
error:

query: SELECT dst AS pfrom, src AS pto FROM url_alias WHERE LOCATE(dst, "trip_trip_trip_trip_trip_search") ORDER BY CHAR_LENGTH(src) ASC
error:

query: SELECT dst AS pfrom, src AS pto FROM url_alias WHERE LOCATE(dst, "trip_trip_trip_trip_trip_trip_search") ORDER BY CHAR_LENGTH(src) ASC
error:

query: SELECT dst AS pfrom, src AS pto FROM url_alias WHERE LOCATE(dst, "trip_trip_trip_trip_trip_trip_trip_search") ORDER BY CHAR_LENGTH(src) ASC
error:

query: SELECT dst AS pfrom, src AS pto FROM url_alias WHERE LOCATE(dst, "trip_trip_trip_trip_trip_trip_trip_trip_search") ORDER BY CHAR_LENGTH(src) ASC
error:

query: SELECT dst AS pfrom, src AS pto FROM url_alias WHERE LOCATE(dst, "trip_trip_trip_trip_trip_trip_trip_trip_trip_search") ORDER BY CHAR_LENGTH(src) ASC
error:

query: SELECT dst AS pfrom, src AS pto FROM url_alias WHERE LOCATE(dst, "trip_trip_trip_trip_trip_trip_trip_trip_trip_trip_search") ORDER BY CHAR_LENGTH(src) ASC
error:

query: SELECT dst AS pfrom, src AS pto FROM url_alias WHERE LOCATE(dst, "trip_trip_trip_trip_trip_trip_trip_trip_trip_trip_trip_search") ORDER BY CHAR_LENGTH(src) ASC
error:

query: SELECT dst AS pfrom, src AS pto FROM url_alias WHERE LOCATE(dst, "trip_trip_trip_trip_trip_trip_trip_trip_trip_trip_trip_trip_search") ORDER BY CHAR_LENGTH(src) ASC
error:

query: SELECT dst AS pfrom, src AS pto FROM url_alias WHERE LOCATE(dst, "trip_trip_trip_trip_trip_trip_trip_trip_trip_trip_trip_trip_trip_search") ORDER BY CHAR_LENGTH(src) ASC
error:

query: SELECT dst AS pfrom, src AS pto FROM url_alias WHERE LOCATE(dst, "trip_trip_trip_trip_trip_trip_trip_trip_trip_trip_trip_trip_trip_trip_search") ORDER BY CHAR_LENGTH(src) ASC
error:
... snip ...

I've gotta run, but if/when I repair the bug I'll post back. If anybody has any insights to speed the process I'm all ears!

James

Lead Developer on Project Opus
www.projectopus.com

coreyp_1’s picture

You must have "search" aliasing to "trip_search" that is causing the recursion. It is a feature that can lead to infinite loops, I'm afraid.

For example: You can substitute "node/corey" for "node/44", and substitute "content/" for "node/", which will end up changing "node/44" to "content/corey". (this is OK)

If, however, you change "node/" to "mynode/", then the code will give you "mymymymymy......you get the idea......node".

Once again, you probably already know this, but I post the explanation for other's sakes, too.

- Corey

James Andres’s picture

Hi Corey,

I've got it I think. The problem here is that your str_replace function is failing when it reaches an alias that has the source as part of the destination. This is the case in our mapping of 'trip_search' --> 'search'.

This wouldn't show up with the alias action because the src is not a part of the dst. Only the dst is inside the src.

If we were trying to do the reverse mapping the bug would likely be reversed.

My brain is a bit scattered right now ... I'll add the fix in a minute.

---- edit ----

Ok, the fix is simply to break the function after one pass. This fixes the problem with the recursion, but possibly breaks functionality in the custom_url_function (which I don't understand 100% right now). Everything appears fine for me at least.

Change:

  $new = str_replace($result['pfrom'], $result['pto'], $new);

To:

  if ($saftey++ < 1) {
    $new = str_replace($result['pfrom'], $result['pto'], $new);
  }

Sorry for the stream of conciousness there. I kept thinking I was done for the day but had a realization just as I was walking out the door, lol.

James

Lead Developer on Project Opus
www.projectopus.com

James Andres’s picture

One last bug still remaining (my url aliases all work now) is that my links and menu items aren't showing up aliased correctly. For instance I have the alias 'user/64' set to 'james' which maps all links to 'user/64' to 'james' but does not map 'user/64/edit' to 'james/edit'. However, if I type 'james/edit' into the address bar it works out correctly.

That's for tommorow though. ciao.

James

Lead Developer on Project Opus
www.projectopus.com

coreyp_1’s picture

Complete with bells and whistles!

I have added two safeguards to the function:

  1. The function will only implement a substitution once during each call to the function (to prevent the recursion nightmare mentioned before).
  2. The function only pulls one alias at a time from the database. Consider this problem:

    I had three aliases:

    • "node/23" to "corey", courtesy of pathauto
    • "user/3" to "user/corey", courtesy of pathauto
    • "user" to "member" (a desired customization)

    Oddly enough, when the page "member/corey" was requested, the function interpreted it as "user/node/23". Limiting the sql call to one per pass was sufficient to squash this error (it also enabled changing the while() statement to an if() statement).

The new code is:

function custom_url_rewrite($action, $path, $original) {
  static $map = array();
  static $count = NULL;

  if ($count === NULL) {
    $count = db_result(db_query('SELECT COUNT(pid) FROM {url_alias}'));
  }

  if ($count > 0 && $path != '') {
    if (isset($map[$path])){
      $path = $map[$path];
    }
    else {
	  $old = '';
      $new = $path;
      $used = array();
      while ($old != $new){
        $old = $new;
        if ($action == 'alias') {
          $query = 'SELECT src AS pfrom, dst AS pto FROM {url_alias} WHERE LOCATE(src, "'.addslashes($new).'") AND src NOT IN ("'.implode('", "', $used).'") ORDER BY CHAR_LENGTH(src) DESC LIMIT 1';
        }
        else {
          $query = 'SELECT dst AS pfrom, src AS pto FROM {url_alias} WHERE LOCATE(dst, "'.addslashes($new).'") AND dst NOT IN ("'.implode('", "', $used).'") ORDER BY CHAR_LENGTH(src) ASC LIMIT 1';
        }
        $resource = db_query($query);
        if ($result = db_fetch_array($resource)){
          $new = str_replace($result['pfrom'], $result['pto'], $new);
          $used[] = $result['pfrom'];
        }
      }
      if ($action == 'alias'){
        $map[$path] = $new;
      }
      else {
        $map[$new] = $path;
      }
      $path = $new;
    }
  }

  return $path;
}

Let's see if this clears up any of your issues!

- Corey

James Andres’s picture

Hi,

Thanks for the new function Corey. It fixes quite a few of the glitches.

One last bug still remaining (my url aliases all work now) is that my links and menu items aren't showing up aliased correctly. For instance I have the alias 'user/64' set to 'james' which maps all links to 'user/64' to 'james' but does not map 'user/64/edit' to 'james/edit'. However, if I type 'james/edit' into the address bar it works out correctly.

I've honed in on the issue described above above. The problem is due to the way 4.6 handles the static $map variable. I haven't quite figure out the cause of the issue yet but what happens basically is subsequent calls to custom_url_rewrite overwrite parts of the map incorrectly. An example explaines the issue best:

From the code:

  echo "l('should be \'james\'', 'user/64'): " . l('should be \'james\'', 'user/64') . "<br />";
  echo "l('should be \'james/1\'', 'user/64'): " . l('should be \'james/1\'', 'user/64/1') . "<br />";
  echo "url('user/64', null, null, true): " . url('user/64', null, null, true) . "<br />";
  echo "url('user/64/1', null, null, true): " . url('user/64/1', null, null, true) . "<br /><br />";

  echo "<strong>Forth</strong><br />";
  echo drupal_get_path_alias('user/64') . '<br />';
  echo drupal_get_path_alias('user/64/1') . '<br />';
  echo drupal_get_path_alias('user/64/2') . '<br />';
  echo drupal_get_path_alias('user/64/3') . '<br /><br />';

  echo "<strong>Back</strong><br />";
  echo drupal_get_normal_path('james') . '<br />';
  echo drupal_get_normal_path('james/1') . '<br />';
  echo drupal_get_normal_path('james/2') . '<br />';
  echo drupal_get_normal_path('james/3') . '<br /><br />';

I get the output:

l('should be \'james\'', 'user/64'): <a href="user/64">should be &#039;james&#039;</a><br />
l('should be \'james/1\'', 'user/64/1'): <a href="user/64/1">should be &#039;james/1&#039;</a><br />

url('user/64', null, null, true): http://james.projectopus.com:8000/user/64<br />
url('user/64/1', null, null, true): http://james.projectopus.com:8000/user/64/1<br /><br />


<strong>Forth</strong><br />
user/64<br />
user/64/1<br />
james/2<br />
james/3<br /><br />

<strong>Back</strong><br />
user/64<br />
user/64/1<br />
user/64/2<br />
user/64/3<br /><br />

Notice how the ../2 and ../3 translate correctly while the user/64 and user/64/1 translate incorrectly. This is due to the fact that they were called previously in the 'l' and 'url' functions.

-- edit --
Also notice that everything works correctly on the drupal_get_normal path side. Only the aliasing appears to be affected. But that may just be because my example only calls that function once for each alias.
-- /edit --

Note: If I make the following change to custom_url_rewrite the problem goes away:
Change

  static $map = array();

To:

  $map = array(); // Hard reset map on each call.

As always, if I find a solution I'll post it back. Also, it's not unlikely that this bug is caused by something in our Project Opus code that is non-standard to Drupal. I haven't actually tested anything on a clean Drupal setup. I deffinitly should though.......

James

Lead Developer on Project Opus
www.projectopus.com

coreyp_1’s picture

I haven't been able to replicate the results here.

A couple of suggestions:

  1. make sure you don't have a path conflict somewhere. try a query in phpmyadmin like this:
    SELECT src AS pfrom, dst AS pto FROM url_alias WHERE LOCATE(src, "user/64") ORDER BY CHAR_LENGTH(src) DESC
  2. add a few echo's into the custom_url_rewrite() to see what exactly is getting passed to it. It will print out above everything else, but that's OK, since this is only a temporary thing, anyway.
  3. I just saw a stupid mistake in my first comment about 4.6 implementation. In drupal_get_path_alias(), I wrote it to call conf_url_rewrite(), even though earlier in that same comment I said it should be custom_url_rewrite(). If this is what is causing the error, then I'm going to hide my head in the sand!
  4. Does it do this with anyone else's name, or just yours ('james')?

- Corey

James Andres’s picture

Hi Corey,

Thanks for the reply. Actually, it's far more than just user/64 --> james. With $map as a static variable every alias on the site doesn't work. With $map reset on each call then every alias works fine.

By "work" I mean "shows up as an alias" (ie: links on the page are like <a href="james" ...). Even when it's "not working" I can hit james, or james/edit, or faq, or any other alias and get the correct page returned and rendered.

With regards to your suggestions:

  1. There are no obvious conflicts (user/64 ---> james is the only alias for user/64, and there are no aliases for james.. etc).
  2. I've traced the path that custom_url_rewrite takes several times. The traces appear to incidate that paths are being aliased sparatically (with or without $map as static). See bellow for outputs.
  3. No, everything is using the correct functions. Infact i have modified drupal_get_path_alias and get_normal_path to only call the custom_url_rewrite function (ensuring bugs are in that function, not elsewhere).
  4. As mentioned above it breaks every alias on the site, not just 'james'.

Check the output here, because it's big ;-)

Some notes:

  • You'll notice that the path map is correctly working at the very bottom (the final map just before drupal renders the page, when I have the $map variable as non-static)
  • You'll notice that the path fluctuates between correct and incorrect aliases through both sets of data (ie: user/64 is sometimes aliased to user/64 and sometimes to james. There are other examples too, but I'm used to useing that one ;-).

I'll do the following before I get back to you: Try this out on a stock drupal installation, find out if we have installed any patches that might be messing with my results.

James

Lead Developer on Project Opus
www.projectopus.com

James Andres’s picture

Hi,

I've rewritten the function for stock 4.6.x (yay). The results are the same.

Output can be found here.

-- edit --
I didn't mention it there, but notice that the path map is corrupt in both cases of the output. It just happens to work out nicely in the end. For some reason basic alias' aren't getting translated or somehow code like "$map[$path] = $path" is getting run.
-- /edit --
James

Lead Developer on Project Opus
www.projectopus.com

coreyp_1’s picture

Not all of that makes sense. The $map[] variable shouldn't reset in the middle of a script execution. That's what static is for.

Suggestion 1:
change

    if (isset($map[$path])){
      $path = $map[$path];
    }

to

    if ($action == 'outgoing' && isset($map[$path])){
      $path = $map[$path];
    }

use 'alias' instead of 'outgoing' for 4.7

Suggestion 2:

      if ($action == 'outgoing'){
        $map[$path] = $new;
      }
      else {
        $map[$new] = $path;
      }

to

      if ($action == 'outgoing'){
        $map[$path] = $new;
      }

that solves a potential mal-assignment, but I'm not sure whether or not it will help your situtation.

What code did you use to generate this output? When you reference the non-static code, there shouldn't be any left-overs from the static bit (unless, of course, it is because you are displaying the info with drupal_set_message(), and there was a call to the function on the previous page *after* that page's content was generated, but before the script ended).

- Corey

James Andres’s picture

Thanks for the reply Corey.

The debug output I got via drupal_set_message('<code>' . var_export($map, 1) . '');

I have changed that to echo '<code>' . var_export($map, 1) . '


'; in hopes that I will be getting direct feedback without the drupal_set_message(); lag you referred to.

I didn't write the path module of course, but as far as I can tell $map is being made static as an optimization. I'm pretty sure of this because calls to custom_url_rewrite are purely function based (ie: put in a path, out comes an alias), and $map is not global. Making map a regular variable should only slow down repetative calls (ie: if the call to alias 'node/123' --> 'news-portal' gets run 15 times). My thinking was to try and corner the output in hopes that I could watch exactly what was being set to the map on each call. When the map is persistant throughout the script execution the output get jumbled and it is hard to track who's overwriting who's data.

Anyway, on with da results ;-) :

Your code:

      if ($action == 'alias') {
        $map[$path] = $new;
      }
      //else {
      //  $map[$new] = $path;
      //}

worked perfectly! Thanks!

Strangely enough I assumed that further tweaking the code to :

      if ($action == 'alias') {
        $map[$path] = $new;
      }
      else if ($action == 'source') {
        $map[$new] = $path;
      }

would produce better results. Alas, that broke the system again. I'm not quite sure why...

Good enough for this week though.

If I do any other cool stuff with this code I'll let you know :-).

James

Lead Developer on Project Opus
www.projectopus.com

coreyp_1’s picture

James,

Yes, the static variable is used to cache the results so that repeated $path aliasing calls do not over-use valuable processing resources.

I belive that the reason the reverse mapping does not work is due to *how* the $path is encoded and decoded. Without getting into a long drawn-out explanation, let me just leave it at this: Reverse mapping does not behave as expected in this situation, and is therefore not cached. This should not be a concern, however, since the decoding routine is only called once per page (unless specifically requested, as in your test script).

So, is the function now working perfectly for you? I am interested to find out, since your data is far more extensive than mine (for practical testing purposes).

- Corey

James Andres’s picture

Hi Corey,

Thanks for the follow up.

It seems to be working correctly for the most part. I have your code on my testing box so hopefully we'll be able to squeeze any bugs out.

The only issues I have noticed so far are:

  1. I have aliased trip_search --> search, as you know, and now the 'search' path does indeed work. Strangely though the original 'trip_search' path does not work. I had a hunch that trip_search was getting rewritten to trip_trip_search (however that path doesn't work either!!). Anyway, the 'search' path does work, so good enough for me for now, lol. Still, this is likely breaking my search functionality in some way.
  2. A definite slowdown in the time it takes to render each page. This one will have to get solved if I'm ever going to move this code live. I'll let you know if we make any progress in that area.

Positive notes:

  • After running a script to remove unecessary aliases (user/1/edit, user/1/bio ... etc) and replace them with a single alias (user/1 --> foobar) we went from 13,000 aliases to 4,300 !! Excellent!

James Andres

Lead Developer on Project Opus
www.projectopus.com

coreyp_1’s picture

It seems like "trip_search" is getting re-written to "trip_trip_search". Typing in "trip_trip_search" won't work, either, because that will be rewritten to "trip_trip_trip_search", etc. I can only think of one work-around (aside from more heavy coding).

You could make a page with the alias "trip_trip_search" and put php code in it to do a drupal_redirect() to "search"

As for the optimization:

Is everything going through url_custom_rewrite()? I just wanted to make sure that you modified the core files so that nothing is still going through the original rewrite functions.

- Corey

James Andres’s picture

Ack, i just wrote a really long post and lost it. I'll summarize what I was
originally going to say:

* Not a big deal on the trip_trip_search thing. It doesn't really harm functionality in this case.

A thought though, how about rewriting the custom_url_rewrite function to use some form of weak regular expression (ie: similar to how the block administrator works, 'node/*' .. etc). To make this efficient, likely a lot more efficient, we would probably need some sort of static variable to keep track of which aliases should be run through the custom_url_rewrite function (something like $greedy_aliases maybe).

This would provide at least the following benefits:

  1. Ability to match urls in both the old way and the new substitution way (ie: I could map 'node' --> 'content/list' and also map 'node/*' --> 'content')
  2. Performance increases because all paths wouldn't have to run through the
    custom_url_rewrite function
  3. Administration safety, large established sites with thousands of preexisting aliases don't want to be worried about conflicting aliases when installing a new patch.

Anyway, I'll write up an example and post back how (err .. if ;-) it works.

James Andres

Lead Developer on Project Opus
www.projectopus.com

coreyp_1’s picture

Sorry to post this separately, but my previous comment was getting a little long, and this is really a separate issue.

Why are you creating a separate "custom address" module, when all it is is a simple userid lookup? You could do that all from the comfort of the custom_url_function(), and never have to add another path alias. The only reason it currently relies on the url_alias table is because that is what would be most beneficial for the majority of users.

- Corey

James Andres’s picture

Hi Corey,

Good question. If it was only doing cutom addresses that would make more sense, but these functions are actually part of a larger module called 'customization' that enables all sorts of user customization.

Also the custom address part has quite a bit of logic to deal with who 'owns' what custom address, what custom addresses are off limits (ie: you can't have the custom address 'login' because that would throw the system off kilter). Etc.

James

Lead Developer on Project Opus
www.projectopus.com

James Andres’s picture

Just before it get's asked ... no, this isn't 2 modules. It's one module called customization with hooks to do many things.

Also, the system isn't automatic, the user has to opt-in to get a custom address.

James

Lead Developer on Project Opus
www.projectopus.com

James Andres’s picture

As mentioned before (in that thread that was getting terribly cramped for space) I have rewritten the custom_url_rewrite function to use regular expressions. It isn't "pretty" but I'll clean up the code later ;-).

Usage:

  • Make an alias of the form: node --> content. That will alias ONLY the node page to content.
  • Make an alias of the form: node/* --> content/. That will alias everything starting with 'node/' to 'content/'

There's probably lots of bugs, feel free to pick it appart.

Note 1: You'll notice that my 'usage' statement is actually not quite correct. If you alias user* --> person you will get possibly unexpected behaviour when visiting admin/user (it will convert to admin/person). In my opinion thats fine for now.

function custom_url_rewrite($action, $path, $original) {
  static $map = array();
  static $greedy_aliases = NULL;
  static $count = NULL;

  if ($count === NULL) {
    $count = db_result(db_query('SELECT COUNT(pid) FROM {url_alias}'));
  }

  if (($count > 0) && ($greedy_aliases === NULL)) {
    // get all greedy aliases (ONCE!)
    $sql = "select src, dst from url_alias where src like '%%*%%'";
    $result = db_query($sql); // get all aliases with the asterisk char in them.

    $greedy_aliases = array(); // Note: this keeps us from entering this block on every call (empty array !== null)
    while ($row = db_fetch_array($result)) {
      // Yea I know, lots of stuff is faster than str_replace.  Good enough for now though ;-)
      $sans_asterisk = str_replace('*', '', $row['src']);
      $greedy_aliases[$sans_asterisk] = $row['dst'];
    }
  }

  //NOTE: This is a fairly brute force algorithm.  Make me a nice fast search ;-)
  $be_greedy = false;
  foreach ($greedy_aliases as $src => $dst) {
    if ($action == 'alias') {
      if (strpos($path, $src) !== false) {// If $src is inside $path
        $be_greedy = true;
      }
    } else { // $action == 'source'
      if (strpos($path, $dst) !== false) {// If $dst is inside $path
        $be_greedy = true;
      }
    }
  }
  if (!$be_greedy) { // Do a standard path lookup
    return drupal_lookup_path($action, $path);
  }

  if ($count > 0 && $path != '') {
    if (($action == 'alias') && isset($map[$path])){
      $path = $map[$path];
    }
    else {
      $old = '';
      $new = $path;
      $used = array();
      while ($old != $new){
        $old = $new;
        
        //NOTE: Had to modify these two queries to make them asterisk agnostic.
        //      this likely makes the queries a fair bit slower.  There is a better way to do this I'm sure.
        if ($action == 'alias') {
          $query = 'SELECT REPLACE(src, "*", "") AS pfrom, REPLACE(dst, "*", "") AS pto FROM {url_alias} WHERE LOCATE(REPLACE(src, "*", ""), "'.addslashes($new).'") AND src NOT IN ("'.implode('", "', $used).'") ORDER BY CHAR_LENGTH(src) DESC LIMIT 1';
        }
        else {
          $query = 'SELECT REPLACE(dst, "*", "") AS pfrom, REPLACE(src, "*", "") AS pto FROM {url_alias} WHERE LOCATE(REPLACE(dst, "*", ""), "'.addslashes($new).'") AND dst NOT IN ("'.implode('", "', $used).'") ORDER BY CHAR_LENGTH(src) ASC LIMIT 1';
        }
        $resource = db_query($query);
        if ($result = db_fetch_array($resource)) {
          $new = str_replace($result['pfrom'], $result['pto'], $new);
          $used[] = $result['pfrom'];
        }
      }
      if ($action == 'alias') {
        $map[$path] = $new;
      }
      $path = $new;
    }
  }

  return $path;
}

James Andres

Lead Developer on Project Opus
www.projectopus.com

madhatter’s picture

Hi James,

Would it be possible for you to share the code that would allow the address to be customized from www.site.com/user/1 to www.site.com/james?

thanks in advance,
Vincent

James Andres’s picture

Hi Vincent,

For simple rewrites like user/1 --> /james (when the user's name is james) the easiest method is using the pathauto module. If you are also using the patch's discussed here you could have all paths with user/1 in them to become james/whatever (ie: james/edit).

Cheers,

James Andres

Lead Developer on Project Opus
www.projectopus.com

madhatter’s picture

Thanks. I installed the pathauto module and it converted the url addresses. However I found that it cannot add url aliases with users that have Chinese characters in their names. Since my site is targeted to Chinese users, I'm not sure what I can do about this.

Vincent.....

www.buzzcn.com

James Andres’s picture

Hey Vincent,

That's a bit tricky... Too bad http isn't multilingual, lol.

I think in this case the only solution would be a module to allow users to pick their own addresses, similar to what we are doing. Our current module won't work with 4.7 and isn't ready for realease yet, alas.

You could give the developers of pathauto an email and they might have more advice / tricks to try.

Good luck!

James Andres

Lead Developer on Project Opus
www.projectopus.com

Hayakawa’s picture

i have read all the post but i'm confused what to do. i have listed the occasions. if you answer it's greatly appreciated and i'm sure there are lot more people needs to use the code you (Corey and James) have produced.

1. i'm using pathauto and want to minimize the performance penalty. so which code should i use?
2. i want to fix "Pathauto is nice, but while you may have gotten to the page by going to www.example.com/my_vacation_photos, when you want to edit the page, it goes to www.example.com/node/476/edit." issue. which code should i use?
3. if this is a replacement for whole url aliasing of path and pathauto modules, which code should i use?

thanks in advance

coreyp_1’s picture

Hey James,

Sorry it's been so long. Work is more hectic than ever, and I haven't been able to do much coding lately.

I looked through your function, and have a few concerns.

First, loading an entire array of $greedy_aliases even once will kill a site once it gets too big. This is how the url aliases used to work, until sites with many aliases started choking on that single bit of code. In your situation, if you're expecting to grow to 1 million users, then that means that, for each page call, 1 million user aliases will be loaded into that function.

The second concern is that, once $greedy_aliases is loaded, it is only used in the first part of the function. To me, it almost does not seem worth the potential resource consumption (think 1 million user aliases) when it is only used for a tue/false test. I could be wrong, though. Have you been able to benchmark the two methods?

Now, for my idea of how your implementation could work:

First of all, checking for an asterisk is too time consuming, for php or mysql. If there were a way to store that option (whether or not to use pattern matching) in the database as a simple true/false value for each alias so that it would not have to be computed each time, then that would be a better solution. If this were a feature modification for core, then this could be added as a checkbox beside each url alias, and stored in a new database column (call it "pattern_match", for example). SQL queries could then test whether each alias is intended to be used in pattern matching, and then use the "LIKE" syntax only if needed.

The problem with this is that it is getting into modifying core files, a procedure which obviously should be avoided.

I want this to work, because I believe it would have a positive effect on site presentation. I'm just not sure that I can get the performance cost down enough to make it practical.

The other thought is that most websites using this would want just about every alias to be pattern matched, since most aliases are for content and user info.

Still thinking...

- Corey

sun’s picture

For anyone interested in enabling this feature for path + pathauto modules, go ahead with these steps:

Warning: Do not use this code if you don't know what it does. There's no support. Use it at your own risk. (DEVs only)

  1. Activate path module.
  2. Install and activate pathauto module.
  3. Create a new file and insert this code without ending ?> and save it to /sites/default/custom_url_rewrite.inc.
  4. Open /sites/default/settings.php and insert the following line at the end:
    require_once './sites/default/custom_url_rewrite.inc';
    

ATM I'm a bit afraid that there will be false positives with a default pathauto configuration. Seems like I've to dive into those modules to find out if it could be an issue of this "patch" or pathauto in general.

Daniel F. Kudwien
unleashed mind

Daniel 'sun' Kudwien
makers99

mshaver’s picture

I may have found a bug with the above code in a strange situation that I've detailed here:

http://drupal.org/node/111775

Otherwise works great. Just need to figure out why it's not allowing me to access these specific edit pages.

Mike

coreyp_1’s picture

Are you using the updated version of the function? The updated version is found here, but with this modification.

Here is how the assembled, updated version should look (for 4.7 or 5.x):


function custom_url_rewrite($action, $path, $original) {
  static $map = array();
  static $count = NULL;

  if ($count === NULL) {
    $count = db_result(db_query('SELECT COUNT(pid) FROM {url_alias}'));
  }

  if ($count > 0 && $path != '') {
    if ($action == 'outgoing' && isset($map[$path])){
      $path = $map[$path];
    }
    else {
      $old = '';
      $new = $path;
      $used = array();
      while ($old != $new){
        $old = $new;
        if ($action == 'alias') {
          $query = 'SELECT src AS pfrom, dst AS pto FROM {url_alias} WHERE LOCATE(src, "'.addslashes($new).'") AND src NOT IN ("'.implode('", "', $used).'") ORDER BY CHAR_LENGTH(src) DESC LIMIT 1';
        }
        else {
          $query = 'SELECT dst AS pfrom, src AS pto FROM {url_alias} WHERE LOCATE(dst, "'.addslashes($new).'") AND dst NOT IN ("'.implode('", "', $used).'") ORDER BY CHAR_LENGTH(src) ASC LIMIT 1';
        }
        $resource = db_query($query);
        if ($result = db_fetch_array($resource)){
          $new = str_replace($result['pfrom'], $result['pto'], $new);
          $used[] = $result['pfrom'];
        }
      }
      if ($action == 'alias'){
        $map[$path] = $new;
      }
      $path = $new;
    }
  }

  return $path;
}

- Corey

denney’s picture

Thankyou for this script.

It works perfectly with Drupal 5.1 with my preliminary tests. I plan on testing it some more but first impressions, it works great.

mshaver’s picture

Thanks for the reply. Yes, I am using this updated version and the bug still persists. I know there was talk about where to put the file? I have mine in my "include" directory being called from the bottom of the "settings.php" file like this: require_once './includes/custom_url_rewrite.inc';

It's such a strange bug. Here is another situation that it presents itself:

Pathauto settings for users set to: compass/staff/[user]

User registers with username: ginny
Another user registers with username: ginnyr

When viewing "ginnyr" profile and selecting the edit tab, the page defualts to the currently logged in user. Not a problem if "ginnyr" is trying to edit her page, but is if a user with "administer users" permissions is trying to edit her page.

I mention this in the issues posts as well, but it only happens if the string is contained in part of the url and isn't unique. So "ginnym" and "ginnyr" are fine, as well as "ginny_smith" or "ginnyr_smith". But you can't have "ginny" and "ginnym", "ginny_smith", "ginnysmith".

Does this make any sense?

Thanks for your help!

coreyp_1’s picture


function custom_url_rewrite($action, $path, $original) {
  global $link_counter;
  $link_counter++;
  static $map = array();
  static $count = NULL;
  if ($count === NULL) {
    $count = db_result(db_query('SELECT COUNT(pid) FROM {url_alias}'));
  }

  if ($count > 0 && $path != '') {
    if ($action == 'alias' && isset($map[$path])){
      global $map_counter;
      $map_counter++;
      $path = $map[$path];
    }
    else {
	  $old = '';
      $new = $path;
      $used = array();
      while ($old != $new){
        $old = $new;
        if ($action == 'alias') {
          $query = 'SELECT src AS pfrom, dst AS pto FROM {url_alias} WHERE LOCATE(src, "'.addslashes($new).'") AND src NOT IN ("'.implode('", "', $used).'") ORDER BY CHAR_LENGTH(src) DESC LIMIT 1';
        }
        else {
          $query = 'SELECT dst AS pfrom, src AS pto FROM {url_alias} WHERE LOCATE(dst, "'.addslashes($new).'") AND dst NOT IN ("'.implode('", "', $used).'") ORDER BY CHAR_LENGTH(src) ASC, CHAR_LENGTH(dst) DESC';
        } 
        $resource = db_query($query);
        if ($result = db_fetch_array($resource)){
          $new = str_replace($result['pfrom'], $result['pto'], $new);
          $used[] = $result['pfrom'];
        }
      }
      if ($action == 'alias'){
        $map[$path] = $new;
      }
      $path = $new;
    }
  }

  return $path;
}

- Corey

mshaver’s picture

That works like a charm! Thanks so much!

Mike

sun’s picture

Corey, did you already post this snippet to Drupal's path component or the pathauto module issue queue?
It seems that it has evolved to a fairly stable state and IMHO it should be integrated in one of those modules.

Daniel F. Kudwien
unleashed mind

Daniel 'sun' Kudwien
makers99

coreyp_1’s picture

Urls are aliased and un-aliased by core code, not modules. Oddly enough, Path has nothing to do with the process, it just ads an interface to create/edit aliases. Therefore, if it were to be added, it would have to be added to Drupal core in the drupal_get_path_alias() and drupal_get_normal_path() functions from path.inc.

I tried putting the function into a module, since I figured most people would be more comfortable uploading a module than editing settings.php, but the url un-aliasing code is called before modules are included, so that didn't work.

The main drawback for this function is that, instead of just one query per l() request, there is a minimum of two. As mentioned earlier in the comments, that is not much for most pages, but it could, in theory, cause problems. Then again, if pages are that involved anyway, other things are probably bogging them down, too!

I may add it as a handbook page, I just want to make sure that there are no horrible side effects.

- Corey

mshaver’s picture

This is really invaluable code from my perspective, especially when you are using the path for many of your theming and blocks displays. A normal user doesn't understand when they hit the edit tab why things would look different. This keeps everything consistent which is a valuable usability tool.

Mike

denney’s picture

There seems to be a conflict between your code and the Pathauto modules "index aliases" function.

Basically with your code disabled, a page with the URL "/2007/03/02/post-title" will display fine and entering "/2007/03" as the URL will display a listing of the posts that were posted in March 2007.

With your code enabled though, the "/2007/03" page just shows the word "Node" without a listing of any kind.

denney’s picture

Anyone have any response to this problem? It's a fairly large problem if you make use of "archive" pages and things like that with Pathauto "index aliases" function.

coreyp_1’s picture

Well, I haven't used this feature of path auto, and I don't have an appropriate sample data set to play with. :o(

Could you run this code for me, and tell me what it returns (preferrably with my code enabled, as well as without)?

echo '1. ' . drupal_get_normal_path('2007/03/02/post-title') . '</br>';
echo '2. ' . drupal_get_normal_path('2007/03/02') . '</br>';
echo '3. ' . drupal_get_normal_path('2007/03') . '</br>';
echo '4. ' . drupal_get_normal_path('2007') . '</br>';

You can put it in a page, enable the PHP filter, and hit "preview". This will let us see what your installation thinks that these strings should be interpreted as.

It's odd that the page in question ("/2007/03") shows up with only the word "node". If my function were returning an unknown string, then you should get a "page not found" response.

Also, what settings are you using in PathAuto?

- Corey

denney’s picture

Hmm... well, there is definitely a problem here...

<?php
echo '1. ' . drupal_get_normal_path('2007/03/15/what-do-you-think') . '<br />';
echo '2. ' . drupal_get_normal_path('2007/03/15') . '<br />';
echo '3. ' . drupal_get_normal_path('2007/03') . '<br />';
echo '4. ' . drupal_get_normal_path('2007') . '<br />';
?>

1. With Pathauto index aliases DISABLED. With custom_url_rewrite() ENABLED.
1. node/1
2. 2007/03/15
3. 2007/03
4. 2007

2. With Pathauto index aliases DISABLED. With custom_url_rewrite() DISABLED.
1. node/1
2. 2007/03/15
3. 2007/03
4. 2007

3. With Pathauto index aliases ENABLED. With custom_url_rewrite() DISABLED.
1. node/1
2. pathauto/node/2007/03/15
3. pathauto/node/2007/03
4. pathauto/node/2007

4. With Pathauto index aliases ENABLED. With custom_url_rewrite() ENABLED.
1. node/1
2. pathauto/node/pathauto/node/pathauto/node/pathauto/node/2007/03/15
3. pathauto/node/pathauto/node/pathauto/node/2007/03
4. pathauto/node/pathauto/node/2007

My Pathauto settings are:
-> General Settings
----> Create index aliases
-> Node Path Settings
----> Default Path Pattern = [title]
----> Posts Path Pattern = [yyyy]/[mm]/[dd]/[title]

coreyp_1’s picture

Thanks for the info.

New version, SQL and logic tweak:


function custom_url_rewrite($action, $path, $original) {

  static $map = array();
  static $count = NULL;

  if ($count === NULL) {
    $count = db_result(db_query('SELECT COUNT(pid) FROM {url_alias}'));
  }

  if ($count > 0 && $path != '') {
    if ($action == 'alias' && isset($map[$path])){
      $path = $map[$path];
    }
    else {
      $old = '';
      $new = $path;
      $used = array();

      while ($old != $new){
        $old = $new;
        if ($action == 'alias') {
          $query = 'SELECT src AS pfrom, dst AS pto FROM {url_alias} WHERE LOCATE(src, "'.addslashes($new).'") AND src NOT IN ("'.implode('", "', $used).'") ORDER BY CHAR_LENGTH(src) DESC LIMIT 1';
        }
        else {
          $query = 'SELECT dst AS pfrom, src AS pto FROM {url_alias} WHERE LOCATE(dst, "'.addslashes($new).'") AND dst NOT IN ("'.implode('", "', $used).'") ORDER BY CHAR_LENGTH(src) DESC, CHAR_LENGTH(dst) DESC LIMIT 1';
        }
        $resource = db_query($query);
        if ($result = db_fetch_array($resource)){
          $new = ($new == $result['pto'] ? $new : str_replace($result['pfrom'], $result['pto'], $new));
          $used[] = addslashes($result['pfrom']);
        }
      }

      if ($action == 'alias'){
        $map[$path] = $new;
      }

      $path = $new;
    }
  }

  return $path;
}

- Corey

denney’s picture

Awesome... works perfectly. Thank you for the continued support.

This should DEFINITELY be considered for inclusion in core.

denney’s picture

Contrary to my previous post, everything doesn't work properly.

Editing posts no longer works. Using the code you gave me in the previous posts, another test on the URL:

journal/2007/03/20/alone-time/edit

returns:

pathauto/node/pathauto/node/pathauto/node/journal/pathauto/node/pathauto/node/2007/03/20/alone-time/edit

Also, the following URL:

admin/settings/quicktags

returns:

admin/settings/quickpathauto/taxonomy/tags

As you can see, this is a problem. :)

denney’s picture

Well, I've tried various things but nothing seems to fix these annoying errors.

I'm going back to the old script for now and just disabling index aliases because it caused less headaches.

Looking forward to a response.

denney’s picture

It appears as though the author of the Pathauto module has removed the "index aliases" option in the latest development version. The reasoning is because the views module can be used for the same purpose.

I have yet to try the new code with the "non-index aliases" version. When I do, I will report back.

OK, I've done a simple test and it appears as though the code above works with the latest development version of the Pathauto module. Therefore, I see no reason to fix the broken code for the old version at the moment.

I'm not sure what the problem is but now that the Pathauto developer has written how to use views to create Pathauto like "index aliases" I have found how much more customizable this is.

mshaver’s picture

I thought the SQL change was working for both users and nodes, but it doesn't seem to be! It's working great for nodes, but for users the same issues are occurring for aliases with strings that are the same. I've tried debugging why this is happening, but have hit a dead end. When I do a print_r on the $result variable, it seems to return the correct value, but still defaults to the currently logged in user.

Any additional ideas? Or places that I should look to debug. Thanks!

coreyp_1’s picture

This may be a dumb question, but does your browser have auto-complete enabled? I ask this because of personal experience.

If the browser has auto-complete enabled, then Drupal may actually be returning the correct form, but your browser replaces the specified user's data with your data. I had this happen to me with a browser once, and it took me forever to figure out that it was not a Drupal problem. Does it happen in different browsers/computers?

The fact that $result is returning correctly makes me think that the problem is elsewhere. It doesn't make sense that node edit pages work fine, but user edit pages don't. The same logic is used to interpret both kinds of strings.

- Corey

mshaver’s picture

It doesn't appear related to auto-complete. In fact the edit tab for the problem users isn't even getting to the edit page? When I say it defaults to the current user, I mean to their profile page, not to their edit profile page. That's the crux of the problem, users with this common username string will not be able to edit their profiles.

After a little more investigation, I believe this has something to do with the clean urls setup. On a sandbox install, without clean urls enabled, the script works fine for these users. Do you know which mod-rewrite rules might be interfering?

Thanks for your help!

coreyp_1’s picture

I could not duplicate the error, even with clean urls enabled.

I already had a user named "George", so I added another user with the name "Georges". The user page displayed correctly for both "user/george" and "user/georges", as did their edit pages.

From a theoretical standpoint, clean urls shouldn't matter, because, from Drupal's perspective, the query string always exists in the $_GET['q'] variable. The rewrite rules are simple enough that they could not be causing this problem.

It sounds more like a browser issue. Have you cleared the cache and refreshed the pages?

- Corey

denney’s picture

This looks like an awesome feature that could really be added to the core path module to make things better.

Anyway, my question is, has anyone tested this on Drupal 5.1? If so, are there any quirks or special steps needed to get it running properly?

I'm planning on testing this on Drupal 5.1 in a day or so but would like any input from anyone else who has already attempted this.