A heads-up for people interested in pathauto (especially those hacking at the current code:-) - I'm in the midst of doing a complete rewrite, and any patches based on what's currently committed will not be useful in the very near future. I'm not quite ready to commit the changes to HEAD yet, and this is going to be a very busy week for me, so it probably won't be until next weekend that there's something to play with. But, here's an outline of how I expect it to work...

There will be a pathauto hook, which modules will use to insert themselves into the pathauto settings page. The existing content types (nodes, taxonomy terms, users, and blogs) are being factored out into separate files, and other modules will be able to hook in to have aliases automatically created according to whatever criteria they want to make available. Without trying to document the API, here's a simple, complete example of how user aliases (skipping blog support, and the bulk update function) would be implemented:


/**
 * Implementation of hook_pathauto() for user aliases
 */
function user_pathauto($op) {
  switch ($op) {
    case 'settings':
      $settings = array();
      $settings['module'] = 'user';
      $settings['groupheader'] = t('User path settings');
      $settings['patterndescr'] = t('Pattern for user account page paths');
      $settings['patterndefault'] = t('user/[user]');
      $settings['placeholders'] = array(
        '[user]' => t('The name of the user.'),
        '[uid]' => t('The id number of the user.')
      );
      $settings['bulkname'] = t('Bulk update user paths');
      $settings['bulkdescr'] = t('Generate aliases for all existing user account pages which do not already have aliases.');
      return array2object($settings);
    default:
      break;
  }
}

/**
 * Implementation of hook_user() for users
 */
function pathauto_user($op, &$edit, &$user, $category=FALSE) {
  switch ($op) {
    case 'insert':
      /*
      ** Use the username to automatically create an alias
      */
      $user = array2object($user);
      if ($user->name) {
        $placeholders = array();
        $placeholders['[user]'] = $user->name;
        $placeholders['[uid]'] = $user->uid;
        $src = 'user/'.$user->uid;
        $alias = pathauto_create_alias('user', $placeholders, $src, FALSE);
      }
      break;
    case 'delete':
      /*
      ** If the user is deleted, remove the path aliases
      **
      */
      $user = array2object($user);
      path_set_alias('user/'.$user->uid);
      break;
    case 'update':
      /*
      ** Do not automatically update the alias on a user change
      */
      break;
    default:
      break;
  }
}

Comments welcome...

Comments

mikeryan’s picture

I've committed this to CVS - pathauto.module is now much more modular, containing no specific knowledge of nodes, terms, or any other type of content. Automatic aliasing for a given content type now may be implemented in the module that manages that content type, or by creating a .inc file in the pathauto directory. Implementations for nodes, taxonomy terms, users, and blogs are included.

This is a major rewrite, and the possibilities are near-endless, so the more folks taking a stab at playing with the CVS version the better - please let me know now it works for you from an admin standpoint, and from you developers I'd appreciate feedback on the provided interface and documentation (see README.txt). I'm going to give it a little time for testing before adding to the 4.5 or 4.6 streams.

Note that variable names for type-specific and vocabulary-specific patterns have changed, so those patterns will have to be re-entered in the settings page after installing this version of pathauto.

Next stage:

1. I meant to add a "feed" parameter to the returned settings for modules to indicate whether generating feed aliases is relevant, but forgot before committing. Right now it will generate useless feed aliases for users.

2. Next I'll be looking at addressing the bulk update issue.

Thanks,

mattengland’s picture

If this new flavor of pathauto helps to autmatically create aliases beyond just node-creation time (particularly for blogs, which I'm interested) and thus is always aliasing "on the fly" (eg, when a blog entry gets reassigned to a new category), then I'd love to check this out. Does it do this?

-Matt

venkat-rk’s picture

Well, it looks like Jeremy may have had a point after all with his preference for dynamic aliasing as opposed to what he felt was pathauto's static aliasing:
http://www.greenash.net.au/posts/thoughts/hierarchical_url_aliasing#comm...

Matt's scenario here (when a blog entry is reassigned to a different category) is just the sort of situation where dynamic aliasing would be handy.

Don't get me wrong, pathauto is a great module, possibly one of Drupal's very best.

mattengland’s picture

Pathauto is wonderful. (It's one of the big reasons why I'm leaving MT, WPMU, b2evo, and Nucleus behind...I think.) And it's clear to me that it should belong in the core of Drupal, and all the rest of Drupal should be built around the concept of constantly making readable URLs in the way that each indvidual site admin wishes them (the readable URLs at her/his site) to be.

And it's clear to me this requires constant generation (and not just creation-at-node-creating time) of readable URLs in the site-admin's designated format.

In short: My very-inexperienced view is that pathauto shouldn't be an "add on" to Drupal; Drupal should instead build around its concepts, and everything should be automatically generated for any changes incurred by any content.

I'd like to see this happen in the next rev of Drupal. Any takers?

-Matt

ps: I don't know how mikeryan or others see updates to this "thing" that I'm updating (is is a node? a project? an issue? See http://drupal.org/node/21757 for more notes on my confusion in more detail)? Am I just wasting my keystrokes here?

degerrit’s picture

Just curious: will the new module improve performance? This is a wicked module, but I had to disable it because my url_alias table of over 6000 nodes was slowing down the site noticeably, and the bulk updates never even finished their work (~10000 nodes) because of a php-execution-timeout every load of '/admin/settings/pathauto'. Here's a bit of my mysql slow-query-log (a PIV 2GHz/512MB machine):

SELECT nid,type,title,uid,created,src,dst FROM node LEFT JOIN url_alias ON CONCAT('node/', nid) = src WHERE dst is null;
# Time: 050503 0:57:47
# User@Host: dradm1[dradm1] @ localhost []
# Query_time: 279 Lock_time: 0 Rows_sent: 7037 Rows_examined: 44226809

That's a hell of a lot of rows!
Of course the browsing issue is not really pathauto's fault, probably some expensive SQL concatenation in other parts of Drupal?

mikeryan’s picture

re: "dynamic" aliasing... Right now when you change a node, a new alias is generated in addition to the original one. This is an accident, actually, in the refactoring I lost the code that checked to see if a node already had an alias before creating one on update. Frankly, I consider automatically changing the alias on changes to the data a bad idea (unless the old alias can be made a permanent redirect to the new one) and will not support that as default behavior. Remember that the purpose of pathauto is to help with search engine indexing, and changing the address of a page casually is not going to help people find it. However, since there is demand for it, I'll implement an option for the behavior when editing a node with an existing alias: leave the alias alone (default), replace it, or add an additional alias.

re: pathauto into core - I'm biased, of course, but I think pathauto would be a worthwhile addition to core (probably by integrating its functionality into path.module). One motivation in this refactoring, actually, was to make it small, clean, and modular enough to fit well into core (it was becoming something of an unwieldy Swiss army knife with direct support for various modules all being thrown into pathauto.module). In particular, I really want the ability to have aliases that do permanent redirects to other aliases, which would require some core support for actually performing the redirects and an additional column in the url_alias table to indicate which aliases should redirect. And, as someone pointed out elsewhere, it'd be good to have a flag in url_alias indicating whether an alias was manually or automatically created, so bulk updates could replace automatic aliases while leaving manual ones alone.

re: performance - 6000 rows in a simple table like url_alias shouldn't really be a bottleneck in MySQL for the types of things the core and path.module do (usually looking up one specific alias or src at a time) - have you optimized your tables lately? Yes, bulk update is very slow - a LEFT JOIN of the node and url_alias tables with a CONCAT in the join condition is pretty hairy. I don't really see a good way to avoid that, except to try to break it down into chunks that could be processed in the background via a cron hook...

One thought, looking at the core schema - maybe an index on the src column in url_alias would help...

mattengland’s picture

I'll implement an option for the behavior when editing a node with an existing alias: leave the alias alone (default), replace it, or add an additional alias.

Is it possible to create a "global" admin option to set a default behavior for all nodes, and then hide the "change" option from the node-creation, or in my case, blog-entry user?

(I would most like choose the "additional alias" option all the time...and I suspect most others would, too.)

I'm managing a group/company of people, and I want all their "knowledge logs" and status reports to present consistent aliases/URLs such that all readers/categorizing agents know how to get to the content. If an editor changes the category of a blog entry, that needs to be reflected automatically in the alias. In such a case, I would want the category-blog sorting to remove the entry from it's list, which I think is outside the scope of the pathauto stuff?

(Is this all making sense? Ask me more if it doesn't.)

I reliaze I'm just one requester among hundreds/thousands. Having said that: I suspect my category of admin/user will grow as more people start using blogs for knowledge-logging in companies (for I get the impression it's still a very new concept not used by many, but I think destined to grow).

-Matt

mikeryan’s picture

Yes, my plan is to make it a global node option, my assumption is that anyone who wants any dynamic aliasing will want everything dynamically aliased.

Right now I'm preparing to finally upgrade my live site to 4.6.0 (the new event module is a lot of work), it'll probably be a couple of weeks before I do anything new with pathauto...

mikeryan’s picture

I've committed a batch of changes to HEAD today...

Aliasing of RSS feeds is back. The API includes an option for pathauto implementors to indicate their content type supports feeds.

Using "taxonomy/term/<id>" as a src didn't work with feeds, which require a depth in the URL. Term aliases are now generated to point to taxonomy/term/<id>/0.

Administrators now have three options when updating content which already has an alias - do nothing (the default), add a new alias, or replace the previous alias. Note: there's no way to distinguish between an automatically generated alias and an explicitly entered alias, so the last option could blow away manual aliases.

Versioning has been added, and your first visit to the settings page will automatically rename variables whose names have changed since the 4.5/4.6 release (thus preserving your former settings).

mikeryan’s picture

I have committed the refactored version of pathauto to Drupal 4.5 and 4.6. This will be the last version of pathauto which works with 4.5 - my next update will add the "overview pages" functionality, which would require different code for 4.5 and 4.6 (and I've committed myself to 4.6 now, so that's the only version I'll develop).

mikeryan’s picture

Closing this out - this redesign work is done and committed, it seems stable.

RobRoy’s picture

I have the cvs version of pathauto and about 4000 terms in a vocabulary. And the query on line 101 of pathauto_taxonomy.inc

 $query = 'SELECT tid,vid,name,src,dst FROM {term_data} '.
    "LEFT JOIN {url_alias} ON CONCAT('taxonomy/term/', tid) = src"; 

is taking forever and surpassing the 30 second limit of php. I even added an index to src, but didn't help. What could be done to avoid this? I can't even access admin/settings/pathauto without it dying.

mikeryan’s picture

Yes, the queries necessary to do bulk updates are inherently very slow, there's just not much a database engine can do to optimize a CONCAT() in the join (can you say "table scan"? I knew you could...) The long-term solution is to let cron do it (breaking down the problem into reasonable chunks, progress passed from one cron call to the next). For now, I've gotten by with this line in my .htaccess file:

php_value max_execution_time 1500

RobRoy’s picture

But why is taxonomy_pathauto_bulkupdate() called when just accessing admin/settings/pathauto? Shouldn't it only be called when I actually do a bulkupdate?

mikeryan’s picture

Bulk updating is triggered by setting a variable (in this case, pathauto_taxonomy_bulkupdate) TRUE - when the settings page sees that, it performs the bulk update then sets it FALSE again. I can see now that if the operation is interrupted by a timeout, the variable never gets set FALSE so it will attempt to run on every visit.

My next update of pathauto will set the variable FALSE before performing the bulk update, so the only consequence of timing out will be that not all of the updating will be done. In the meantime, DELETE FROM variable WHERE name='pathauto_taxonomy_bulkupdate' ought to clean you up.

Thanks for the report.