If the user leaves the path alias field empty, this module will auto-generate the path. Optionaly information about creation date can be included in the path.
There is a lot of opinions on the best way to do this generation. Everyone seams to have their own idea on what should be included in the path and how it should be ordered. So I wrote the code so that it's fairly easy to add new methods for path generation.
One problem with this module is that it may wake expections that it does not fill. In a url like this www.domain.com/2004/10/23/a-title users may expect that shortening the url to www.domain.com/2004/10 will show the nodes from october.
In order to use the module, copy the code into a file named "path_automatic.module" and drop into your module folder.
// $Id: path_automatic_automatic.module,v 0.1 2004/12/21 21:22:26 Tommy Sundström Exp $
/**
* This module auto-generates a path alias.
*
* It needs path.module in order to work.
*/
/**
* Implementation of hook_help().
*
*/
function path_automatic_help($section) {
switch ($section) {
case 'admin/modules#description':
// This description is shown in the listing at admin/modules.
return t('If the path module is active, renames the URLs automaticly.');
}
}
/**
* Settings for how the path will get generated.
*/
function path_automatic_settings() {
if (module_exist(path)) {
if (user_access('create url aliases') || user_access('administer url aliases')) {
$methods = array(
array('value' => 'default',
'label' => 'Just title',
'descr' => '<p>'.t('If Path Alias field is left blank, generates a path from the title.').'</p>'
),
array('value' => 'yearmonth',
'label' => 'Include year and month',
'descr' => '<p>'.t('Includes the creation month in the path, in addition to the title.').'</p>'
),
array('value' => 'yearmonthday',
'label' => 'Include year, month and day',
'descr' => '<p>'.t('Includes the creation date in the path, in addition to the title.').'</p>'
),
);
$options = array();
$description = '<br /><dl>';
foreach($methods as $method) {
$options[$method['value']] = $method['label'];
$description .= '<dt>'.$method['label'].'</dt>';
$description .= '<dd>'.$method['descr'].'</dd>';
}
$description .= '</dl>';
$output = form_select(t('Choose path generation method'),
'path_automatic_method',
variable_get('path_automatic_method','default'),
$options,
$description
);
//)
//$extra = 0, $multiple = FALSE, $required = FALSE);
}
else {
$output = t('You need permission from the path module to <i>create url aliases</i> or <i>administer url aliases</i> for this module to work');
}
}
else {
$output = t('This module requires that the path module <a href="/admin/modules">is activated</a>.');
}
return $output;
}
/**
* Implementation of hook_nodeapi().
*
* Allows URL aliases for nodes to be specified at node edit time rather
* than through the administrative interface.
*/
function path_automatic_nodeapi(&$node, $op, $arg) {
if (module_exist(path) && (user_access('create url aliases') || user_access('administer url aliases'))) {
switch ($op) {
case 'validate':
// (This is not realy validation, but this is the best place to patch in this funtionality.)
$node->path = trim($node->path);
// If no path alias is given and autogenerate is true (in settings), auto generates a path alias from the title.
if (!$node->path) {
// No alias, generating it
if ($node->title) {
$method = variable_get('path_automatic_method','default');
$pathstart = '';
// generate pathstart
switch ($method) {
case 'default':
// No need to do anything
break;
case 'yearmonth':
$pathstart = date('Y/m',$node->created) . '/';
break;
case 'yearmonthday':
$pathstart = date('Y/m/d',$node->created) . '/';
break;
default:
$pathstart = die('Error: the method given path_automatic does not exist');
}
if (strstr($pathstart,'***NOT_VALID***')) {
// One (or more) of the stringToPath has failed.
break;
}
// generate the path equivalent of the title
$pathtitle = stringToPath($node->title);
if ($pathtitle == '***NOT_VALID***') {
// The stringToPath has failed.
break;
}
// validate the path
if (path_automatic_check_if_already_in_use($pathstart.$pathtitle, "node/$node->nid")) {
// If in use, leave path blank
break;
}
// create path
$node->path = $pathstart.$pathtitle;
}
}
}
}
}
/**
* Clean a string so that it can be used as a path.
*/
function stringToPath($str) {
$str = strip_tags($str);
// Accented characters
$from = 'ŠŽšžŸÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÑÒÓÔÕÖØÙÚÛÜÝàáâãäåçèéêëìíîïñòóôõöøùúûüýÿ';
$to = 'SZszYAAAAAACEEEEIIIINOOOOOOUUUUYaaaaaaceeeeiiiinoooooouuuuyy';
for ($n=0; $n < strlen($to); $n++) {
$str=str_replace(utf8_encode(substr($from, $n, 1)), substr($to, $n, 1), $str);
}
// Interpunction etc
$str = preg_replace("/[ \/_]/", "-", $str); // replace with dash
$str = preg_replace("/[.,:;!?¤%&(){}@£$€<>|=*[\]\"\'\\\\]/", "", $str); // remove
$str = preg_replace("/-+/", "-", $str); // collapse mulitple dashes into one
$str = trim($str);
$str = strtolower($str);
$str = urlencode($str);
if (!valid_url($str)) {
// valid_url will not accept all urls that urlencode can generate (it does not seam to accept the %-encoding)
// This should be a rare problem, but to be sure, skip path generation when this is the case.
// (It will only be the case when the title contains other characters than a-z and the ones transformed above.)
return ('***NOT_VALID***'); //$str = '';
}
return ($str);
}
/**
* Check that a path alias is not already in use.
*/
function path_automatic_check_if_already_in_use($aliaspath, $realpath) {
if (db_result(db_query("SELECT COUNT(dst) FROM {url_alias} WHERE dst = '%s' AND src != '%s'", $aliaspath, $realpath))) {
return TRUE; // In use
}
else {
return FALSE; // Free to use
}
}
Comments
How effect path module help text?
One problem that I have not been able to solve, is that the help text of the Path alias-field should be changed to tell what happens if the field is left empty.
killes outlines a solution in http://drupal.org/node/14597 , but I don't understand it. Anyone able to explain in more detail.
excellent, cant wait!
Will try this out ASAP.
How hard would it to tweak and get taxonomy urls as well? This is a GREAT step! I cant wait!
The problem with taxonomy is
The problem with taxonomy is that a node can have several different taxonomy terms from one or many taxonomys. How choose?
Tricky
That's the problem, isn't it? IMO, in a perfect world, you could use multiple path aliases (based on taxonomy and date) to get to a given entry, or list of entries that would match a partial URL. I have no idea how that would be done.
Also: I notice that you are stripping out accents in paths. It should be possible to keep accented text in the paths--wikipedia.org manages to, anyhow. The Japanese, Chinese, etc versions of wikipedia have characters from their respective languages in the paths as well, and the accent-stripping technique obviously would not scale to other scripts.
Adam Rice
would this be possible?
How hard would multiple paths be? Granted, I'm not a coder, but could you say, take an incoming URL and say match each level at a time?
/sitename.com/Films/2004/Documentary/name-of-film
The logic would be (this is drupal thinking:)
Ok, is "Films" a taxonomy vocabulary, if not check to see if it's a node.
Films is a vocab; is 2004 a term below Films? If not, is it a node alias?
2004 is a term below Films. Is Documentary a term below 2004? If not, is it a node alias?
Documentary is a term below 2004, is name-of-film a term below Documentary? If not, is it a node alias?
name-of-film is not a term under Documentary, is it a node alias?
It is a node alias. lets display it. (using URL)
Displayed links could perhaps check the current location and use a URL that would be closest to the current location depth wise. Not sure what that would entail though.
Vincent
Multiple path aliases
I belive Drupal (via the path module) already has multiple path aliases. There is no limit to the number of aliases that can be associated with a node, and it would be easy to modify path_automatic module to set several instead of just one.
But we still have to come up with a method to choose the one that should be the primiry one (the one used by Drupal for permalinks etc).
Partial urls is trickier, since it seams to require cooperation/modificaton of a number of modules.
Couldn't both work? Or, for
Couldn't both work? Or, for those people who want only one, they can set up a secondary "path vocabulary" that can be used to generate the path - but not be displayed in the node views (would that be possible?)
Both works, but one has to be primary
As I understand it a node can have a unlimited number of path aliases. But one will be the primary one, used by Drupal for permalinks etc. So the question is: how choose that one (preferably without complicating the user interface with yet another control)
Taxonomy weight?
You're right, that would be tricky. One possibility:
Add a new control (oops, sorry) that would be a pair of radio buttons:
If the taxonomy allows multiple terms, choose the term with the lowest weight. If there are competing terms with equal weight, choose based on alphabetical order.
. . .
If one wanted to get fancy, one could include an expert mode that allows admins to construct the URL syntax themselves using placeholders: some people might want their URLs to read
example.com/2004/12/31/sometitle, othersexample.com/20041231/sometitle.html. People using taxonomy-based URLs might want to include the name of the node-type in the URL, others might not.This seems like a lot of added trouble, and I'm not sure how much benefit. If the URL types cannot be customized by the admin, I would vote for taxonomy-based urls to read
example.com/nodetype/term/sub-term/titleand calender-based urls to read
example.com/yyyy/mm/dd/titleAdam Rice
Once breadcrumbs is solved, path will follow
This control I think is ok. It can be put in the settings, so that would not add complexity to day-to-day use. Actually, it's already there, but no option for taxonomy-based links exists (yet).
While not as easy as placeholders, the module is structured to make it easy to add a new format by modifiying the php code.
As for taxonomy, I think the best strategy might be to solve this problem for breadcrumbs (a far more important navigation tool than url:s). And then make a general breadcrumb to path-routine in this module.
Should Drupal support messy URL structure?
Why on earth would anyone want to use example.com/20041231/sometitle.html instead of example.com/2004/12/31/sometitle ? Should there be extra controls just to support cruft?
If you're stuck with having used that URL before you switched to Drupal, setup an .htaccess, a good 404 import your content into Drupal and make sure you have a search box in your template.
Alternative solution
See http://drupal.org/node/15814 for an alternative solution to this problem.
For us non-coders
What, basically, is the difference between your method and Autopath's? In terms of the resulting url.
Differences
My method does
- a better job with accented characters
- uses dashes instead of underscores to replace (this is a matter of taste, I find dashes easier to read an to talk about. Many users have trouble finding the underscore on their keyboard)
- has an option to include some information about date in the path
- give up more easily than Autopath. If the path already exists Autopath will append a number to it to make it unique.
avoid underscores
Delighted to see underscores being avoided. If the URL is displayed in an underlined format, they become indistinguishable from a space.
New version, new differences
I've made a new version (see code below). Compared to Autopath it
- has options to include category, date etc in path
- gives you full control over how the path looks and is structured
- has a more complex interface (In Autopath I don't even think there is a settings panel. However, if you don't want to configure the path, you don't ever have to open settings.)
- does a better job with accented characters (as Steve points out below, this function may have bugs. It works for me however.)
- uses dashes instead of underscores to replace space and special characters (This is a matter of taste, I find dashes easier to read when the text is underlined, and to talk about. Many users have trouble finding the underscore on their keyboard)
The URL content
As for the url estructure, I have been discussing with my pals what to include and what not. We find that including the date is not useful for anybody:
- search engines will not use this for anything
- humans neither - including the title is useful because you can get an idea of what you are going to find (and it helps search engines indexing your content)
It turns out that including the date makes the url longer without special reason.
As for what vocabulary to use in case there are more than one assigned to a node, one option could be to use the weight option. Or to include a checkbox in categories "use this vocabulary to build clean URL"... but this may be too much... or/and should be mixed with node types: yo might want to use one vocab for one node type and a different one for another...
Also we find necessary to include a node ID in the URL, because if not you might end up with two nodes with the same URL. The probabilty is too small, but until it is impossible we must keep this in mind.
So a proposal would be:
www.domain.com/category/a-title/ID
In case of subcategories:
www.domain.com/category/sub-category/a-title/ID
In case of nodes belonging to a special node, ie forum:
www.domain.com/forum/category/sub-category/a-title/ID
www.domain.com/project/category/a-title/ID
www.domain.com/blog/category/a-title/ID
álvaro
--
the cocktail
furilo.com
Dates can be useful
See http://drupal.org/node/15804 for an example of useful dates in urls.
Also, an url with date will tell you if the node is new or old. Useful in some contexts.
I belive that Autopaths way to automaticly add an serial number to a duplicate path is a better aproach than to always add an ID to solve this rare problem.
Taxonomy and Date structure
www.domain.com/category/a-title/
is fine but shouldn't require the ID at the end. It seems pretty rare that you'd be writing multiple blog posts or book pages with exactly the same title in the same category. Could it be made to say only use the ID in such a rare event that might occur?
I may be in the minority on this, but I do find dates in URLs useful as a breadcrumb on the time plane. Considering that you should be including dates clearly on anything you post anyway, it seems less helpful though. I'm still waiting to see a useful example of weekly dates.
New version - create your own path pattern
In order to accommodate the different wishes on how the path should look, I have partially rewritten the module and changed the interface so that you now provide a pattern for your path, using placeholders for the dynamic parts.
The available placeholders are listed at the settings page, and it's fairly easy to add more.
Unfortunately I have not found any way to add a holder for the node id, since this does not seam to have been created at preview time.
A [module] placeholder would be nice to have, but I have not been able to figure out a good way to do this. Same with a [breadcrumbs] (that would transform the pages breadcrumbs into a path).
The placeholder [categorybreadcrumbs] gives a path with the category and all ancestors to it. The name "categorybreadcrumbs" is clumsy, so suggestions for a better name would be appreciated.
I have also changed the way the module behaves if the path already exists. Now it adds a serial number at the end of the path.
I have not had the opportunity to test the code with sites that has none or just one vocabulary. If you do that, please write a comment and tell if it works.
Accent stripper
Your accent stripper requires that the file be saved in windows-1252 encoding. If saved as UTF-8, it will be broken.
--
If you have a problem, please search before posting a question.
How do I solve this?
This probably explains why I had to use the " utf8_encode " in order to make it work.
Unfortunately I have no idea on how to make this utf8-friendly (I've searched the documentation.) Any pointers on how to solve this would be appreciated.
UTF-8 friendly
To be honest, I don't really see why you need to strip accents at all. You can use UTF-8 characters in IRIs (internationalized resource identifiers), which end up as %-escaped UTF-8 bytes in URIs. Stripping accents is only possible for western languages, and even then it is only barely acceptable for those with only sporadic use of accents. But, most of the accent-using languages do not think of them as accented letters, but as characters in their own right.
Most browsers can deal with IRIs. Check your status bar when hovering over this link (uses %-escapes). Now try it with this one (uses literal characters). Both work fine here (Firefox).
However, back to the code. There are several problems with yours:
- It assumes the source file is encoded in Windows-1252, and that each accented letter occupies only 1 byte. As Drupal uses UTF-8 for its source files, this is a bad assumption. Especially because you posted your code on Drupal.org, which itself uses UTF-8. One way around this would be to use hex escaped bytes in the string (e.g. \xFF), but this is still bad because of the next point.
- utf8_encode() works on ISO-8859-1 data, not Windows-1252. Windows-1252 is the microsoft variant of ISO-8859-1, and contains extra characters in the bytes 0x80 - 0x9F, which the ISOs consider control characters. For example, the 'š' character, which is byte 0x8A in Windows-1252, will be utf8_encoded to codepoint U+8A, which is different from U+160 (the codepoint for 'š').
If you limit yourself to ISO-8859-1 only, then your current method works, provided that you use hex escapes, so your source file is still valid UTF-8. If you want to include all accented latin characters, then you would have to build a map of UTF-8 byte sequences to ASCII characters.
I think all non-ASCII Basic and Extended Latin characters have 2-byte UTF-8 sequences, so you could use this to your advantage (to still store the map as 2 long strings, with the source string twice the length of the target string).
Oh and never, ever rely on PHP.net for Unicode-related information. As far as Unicode goes, PHP is hell. If you don't have a good Unicode font installed to browse all the characters in, google for Everson Mono Unicode.
--
If you have a problem, please search before posting a question.
error
For some reason, I'm getting the followin error when I try to load admin/modules after installing this:
Parse error: parse error, unexpected '{' in /home/adamrice/public_html/h2/modules/path_automatic_automatic.module on line 13That line is
switch ($section) {Which seems correct and innocuous. Not sure what the problem is. But I look forward to getting it working.
Adam Rice
Try changing the module name
Strange. I don't get this error (and it's a part of the code that has not changed since previous version).
I see that you have named the module
path_automatic_automatic.module. Does it make any difference if you remove the extra_automatic? Seams a bit far-fetched, but it's the best suggestion I can come up with...Actually...
If you look at the top-most section of your code, that's how you named it--I was following your lead. I tried taking out the extra _automatic -- no change.
As an experiment, I commented-out that section of the code, and I got the same error on line 26, which shows a similar syntactic pattern--a nested opening { bracket. I don't know why this would be a problem.
Adam Rice
Name correction
My mistake. It should be named path_automatic.module .
Has anyone else experienced the same problem as Adam?
This module is merging with Autopath
This module is merging with Autopath. See http://drupal.org/node/16035
Merge complete
The merged module, pathauto, is now available. Thanks to Tommy for the fine ideas he provided, between us I think we have something which should suit just about anybody's needs. And if not, just open up an issue...
Mike
Fenway Views