If the user leaves the path alias field empty, this module will auto-generate the path. Optionaly information about creation date can be included in the path.

There is a lot of opinions on the best way to do this generation. Everyone seams to have their own idea on what should be included in the path and how it should be ordered. So I wrote the code so that it's fairly easy to add new methods for path generation.

One problem with this module is that it may wake expections that it does not fill. In a url like this www.domain.com/2004/10/23/a-title users may expect that shortening the url to www.domain.com/2004/10 will show the nodes from october.

In order to use the module, copy the code into a file named "path_automatic.module" and drop into your module folder.

// $Id: path_automatic_automatic.module,v 0.1 2004/12/21 21:22:26 Tommy Sundström Exp $


/** 
 * This module auto-generates a path alias.
 *
 * It needs path.module in order to work.
 */
 
/**
 * Implementation of hook_help().
 *
 */
function path_automatic_help($section) {
  switch ($section) {
    case 'admin/modules#description':
      // This description is shown in the listing at admin/modules.
      return t('If the path module is active, renames the URLs automaticly.');
  }
}  



/**
  * Settings for how the path will get generated. 
	*/	
function path_automatic_settings() { 
  if (module_exist(path)) {
	  if (user_access('create url aliases') || user_access('administer url aliases')) {
		  $methods = array(
			             array('value' => 'default',
									       'label' => 'Just title',
												 'descr' => '<p>'.t('If Path Alias field is left blank, generates a path from the title.').'</p>'
												 ),
								 array('value' => 'yearmonth',
									       'label' => 'Include  year and month',
												 'descr' => '<p>'.t('Includes the creation month in the path, in addition to the title.').'</p>'
												 ),
									 array('value' => 'yearmonthday',
									       'label' => 'Include  year, month and day',
												 'descr' => '<p>'.t('Includes the creation date in the path, in addition to the title.').'</p>'
												 ),												 
									 );
			$options = array();
			$description = '<br /><dl>';
			foreach($methods as $method) {
			  $options[$method['value']] = $method['label'];
				$description .= '<dt>'.$method['label'].'</dt>';
				$description .= '<dd>'.$method['descr'].'</dd>';
				}
			$description .= '</dl>';
		
		  $output = form_select(t('Choose path generation method'), 
															'path_automatic_method', 
															variable_get('path_automatic_method','default'), 
															$options,
															$description
															);
															//)
															//$extra = 0, $multiple = FALSE, $required = FALSE);
		
		}
		else {
		  $output = t('You need permission from the path module to <i>create url aliases</i> or <i>administer url aliases</i> for this module to work');
		}
	}
	else {
	  $output = t('This module requires that the path module <a href="/admin/modules">is activated</a>.');
	} 
	return $output;
} 

/**
 * Implementation of hook_nodeapi().
 *
 * Allows URL aliases for nodes to be specified at node edit time rather
 * than through the administrative interface.
 */
function path_automatic_nodeapi(&$node, $op, $arg) {
  if (module_exist(path) && (user_access('create url aliases') || user_access('administer url aliases'))) {
    switch ($op) {
      case 'validate':
			  // (This is not realy validation, but this is the best place to patch in this funtionality.)
			
        $node->path = trim($node->path);
				
				// If no path alias is given and autogenerate is true (in settings), auto generates a path alias from the title.
				if (!$node->path) {  
				  // No alias, generating it 
					if ($node->title) {
					  $method = variable_get('path_automatic_method','default'); 
						$pathstart = '';
						
						// generate pathstart
						switch ($method) {
						  case 'default':
							  // No need to do anything							  
								break;
							case 'yearmonth':
							  $pathstart = date('Y/m',$node->created) . '/';
								break;
							case 'yearmonthday':
							  $pathstart = date('Y/m/d',$node->created) . '/';
								break;								
							default:
							  $pathstart = die('Error: the method given path_automatic does not exist');
							
						}
						if (strstr($pathstart,'***NOT_VALID***')) {
						  // One (or more) of the stringToPath has failed.
						  break;
							}
						
						// generate the path equivalent of the title
						$pathtitle = stringToPath($node->title);
						if ($pathtitle == '***NOT_VALID***') {
						  // The stringToPath has failed.
						  break;
						}
							
						// validate the path
						if (path_automatic_check_if_already_in_use($pathstart.$pathtitle, "node/$node->nid")) {
							// If in use, leave path blank
							break;
						}
													
						// create path
						$node->path = $pathstart.$pathtitle;						
					}
				}
			}
		}
	}
							
						

/** 
 * Clean a string so that it can be used as a path.
 */

function stringToPath($str) {
  $str = strip_tags($str);
						
	// Accented characters					
	$from = 'ŠŽšžŸÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÑÒÓÔÕÖØÙÚÛÜÝàáâãäåçèéêëìíîïñòóôõöøùúûüýÿ';
	$to   = 'SZszYAAAAAACEEEEIIIINOOOOOOUUUUYaaaaaaceeeeiiiinoooooouuuuyy';
	for ($n=0; $n < strlen($to); $n++) {
		$str=str_replace(utf8_encode(substr($from, $n, 1)), substr($to, $n, 1), $str);
		}
	// Interpunction etc
	$str = preg_replace("/[ \/_]/", "-", $str);  // replace with dash
	$str = preg_replace("/[.,:;!?¤%&(){}@£$€<>|=*[\]\"\'\\\\]/", "", $str); // remove 
	$str = preg_replace("/-+/", "-", $str); // collapse mulitple dashes into one
	
	$str = trim($str);
	$str = strtolower($str);
	$str = urlencode($str); 
	if (!valid_url($str)) {
		// valid_url will not accept all urls that urlencode can generate (it does not seam to accept the %-encoding)
		// This should be a rare problem, but to be sure, skip path generation when this is the case.
		// (It will only be the case when the title contains other characters than a-z and the ones transformed above.)
		return ('***NOT_VALID***'); //$str = '';
		}
	return ($str); 
}

 

/** 
 * Check that a path alias is not already in use.
 */
function path_automatic_check_if_already_in_use($aliaspath, $realpath) {
  if (db_result(db_query("SELECT COUNT(dst) FROM {url_alias} WHERE dst = '%s' AND src != '%s'", $aliaspath, $realpath))) {
	  return TRUE; // In use
	} 
	else {
	  return FALSE; // Free to use
	}
}

Comments

Tommy Sundstrom’s picture

One problem that I have not been able to solve, is that the help text of the Path alias-field should be changed to tell what happens if the field is left empty.

killes outlines a solution in http://drupal.org/node/14597 , but I don't understand it. Anyone able to explain in more detail.

nolageek’s picture

Will try this out ASAP.

How hard would it to tweak and get taxonomy urls as well? This is a GREAT step! I cant wait!

Tommy Sundstrom’s picture

The problem with taxonomy is that a node can have several different taxonomy terms from one or many taxonomys. How choose?

adamrice’s picture

That's the problem, isn't it? IMO, in a perfect world, you could use multiple path aliases (based on taxonomy and date) to get to a given entry, or list of entries that would match a partial URL. I have no idea how that would be done.

Also: I notice that you are stripping out accents in paths. It should be possible to keep accented text in the paths--wikipedia.org manages to, anyhow. The Japanese, Chinese, etc versions of wikipedia have characters from their respective languages in the paths as well, and the accent-stripping technique obviously would not scale to other scripts.

Adam Rice

nolageek’s picture

How hard would multiple paths be? Granted, I'm not a coder, but could you say, take an incoming URL and say match each level at a time?

/sitename.com/Films/2004/Documentary/name-of-film

The logic would be (this is drupal thinking:)

Ok, is "Films" a taxonomy vocabulary, if not check to see if it's a node.
Films is a vocab; is 2004 a term below Films? If not, is it a node alias?
2004 is a term below Films. Is Documentary a term below 2004? If not, is it a node alias?
Documentary is a term below 2004, is name-of-film a term below Documentary? If not, is it a node alias?
name-of-film is not a term under Documentary, is it a node alias?
It is a node alias. lets display it. (using URL)

Displayed links could perhaps check the current location and use a URL that would be closest to the current location depth wise. Not sure what that would entail though.

Vincent

Tommy Sundstrom’s picture

I belive Drupal (via the path module) already has multiple path aliases. There is no limit to the number of aliases that can be associated with a node, and it would be easy to modify path_automatic module to set several instead of just one.

But we still have to come up with a method to choose the one that should be the primiry one (the one used by Drupal for permalinks etc).

Partial urls is trickier, since it seams to require cooperation/modificaton of a number of modules.

nolageek’s picture

Couldn't both work? Or, for those people who want only one, they can set up a secondary "path vocabulary" that can be used to generate the path - but not be displayed in the node views (would that be possible?)

Tommy Sundstrom’s picture

As I understand it a node can have a unlimited number of path aliases. But one will be the primary one, used by Drupal for permalinks etc. So the question is: how choose that one (preferably without complicating the user interface with yet another control)

adamrice’s picture

You're right, that would be tricky. One possibility:

Add a new control (oops, sorry) that would be a pair of radio buttons:

  • Use date-based permalinks
  • Use taxonomy-based permalinks

If the taxonomy allows multiple terms, choose the term with the lowest weight. If there are competing terms with equal weight, choose based on alphabetical order.

. . .

If one wanted to get fancy, one could include an expert mode that allows admins to construct the URL syntax themselves using placeholders: some people might want their URLs to read example.com/2004/12/31/sometitle, others example.com/20041231/sometitle.html. People using taxonomy-based URLs might want to include the name of the node-type in the URL, others might not.

This seems like a lot of added trouble, and I'm not sure how much benefit. If the URL types cannot be customized by the admin, I would vote for taxonomy-based urls to read
example.com/nodetype/term/sub-term/title
and calender-based urls to read
example.com/yyyy/mm/dd/title

Adam Rice

Tommy Sundstrom’s picture

Add a new control (oops, sorry) that would be a pair of radio buttons:

* Use date-based permalinks
* Use taxonomy-based permalinks

This control I think is ok. It can be put in the settings, so that would not add complexity to day-to-day use. Actually, it's already there, but no option for taxonomy-based links exists (yet).

While not as easy as placeholders, the module is structured to make it easy to add a new format by modifiying the php code.

As for taxonomy, I think the best strategy might be to solve this problem for breadcrumbs (a far more important navigation tool than url:s). And then make a general breadcrumb to path-routine in this module.

Will Pate’s picture

Why on earth would anyone want to use example.com/20041231/sometitle.html instead of example.com/2004/12/31/sometitle ? Should there be extra controls just to support cruft?

If you're stuck with having used that URL before you switched to Drupal, setup an .htaccess, a good 404 import your content into Drupal and make sure you have a search box in your template.

Tommy Sundstrom’s picture

See http://drupal.org/node/15814 for an alternative solution to this problem.

geekarena’s picture

What, basically, is the difference between your method and Autopath's? In terms of the resulting url.

Tommy Sundstrom’s picture

My method does

- a better job with accented characters
- uses dashes instead of underscores to replace (this is a matter of taste, I find dashes easier to read an to talk about. Many users have trouble finding the underscore on their keyboard)
- has an option to include some information about date in the path
- give up more easily than Autopath. If the path already exists Autopath will append a number to it to make it unique.

clairem’s picture

- uses dashes instead of underscores to replace (this is a matter of taste, I find dashes easier to read an to talk about. Many users have trouble finding the underscore on their keyboard)

Delighted to see underscores being avoided. If the URL is displayed in an underlined format, they become indistinguishable from a space.

Tommy Sundstrom’s picture

I've made a new version (see code below). Compared to Autopath it

- has options to include category, date etc in path

- gives you full control over how the path looks and is structured

- has a more complex interface (In Autopath I don't even think there is a settings panel. However, if you don't want to configure the path, you don't ever have to open settings.)

- does a better job with accented characters (as Steve points out below, this function may have bugs. It works for me however.)

- uses dashes instead of underscores to replace space and special characters (This is a matter of taste, I find dashes easier to read when the text is underlined, and to talk about. Many users have trouble finding the underscore on their keyboard)

alvaroortiz’s picture

As for the url estructure, I have been discussing with my pals what to include and what not. We find that including the date is not useful for anybody:

- search engines will not use this for anything
- humans neither - including the title is useful because you can get an idea of what you are going to find (and it helps search engines indexing your content)

It turns out that including the date makes the url longer without special reason.

As for what vocabulary to use in case there are more than one assigned to a node, one option could be to use the weight option. Or to include a checkbox in categories "use this vocabulary to build clean URL"... but this may be too much... or/and should be mixed with node types: yo might want to use one vocab for one node type and a different one for another...

Also we find necessary to include a node ID in the URL, because if not you might end up with two nodes with the same URL. The probabilty is too small, but until it is impossible we must keep this in mind.

So a proposal would be:

www.domain.com/category/a-title/ID

In case of subcategories:

www.domain.com/category/sub-category/a-title/ID

In case of nodes belonging to a special node, ie forum:

www.domain.com/forum/category/sub-category/a-title/ID

www.domain.com/project/category/a-title/ID

www.domain.com/blog/category/a-title/ID

álvaro

--
the cocktail

furilo.com

Tommy Sundstrom’s picture

See http://drupal.org/node/15804 for an example of useful dates in urls.

Also, an url with date will tell you if the node is new or old. Useful in some contexts.

I belive that Autopaths way to automaticly add an serial number to a duplicate path is a better aproach than to always add an ID to solve this rare problem.

Will Pate’s picture

www.domain.com/category/a-title/

is fine but shouldn't require the ID at the end. It seems pretty rare that you'd be writing multiple blog posts or book pages with exactly the same title in the same category. Could it be made to say only use the ID in such a rare event that might occur?

I may be in the minority on this, but I do find dates in URLs useful as a breadcrumb on the time plane. Considering that you should be including dates clearly on anything you post anyway, it seems less helpful though. I'm still waiting to see a useful example of weekly dates.

Tommy Sundstrom’s picture

In order to accommodate the different wishes on how the path should look, I have partially rewritten the module and changed the interface so that you now provide a pattern for your path, using placeholders for the dynamic parts.

The available placeholders are listed at the settings page, and it's fairly easy to add more.

Unfortunately I have not found any way to add a holder for the node id, since this does not seam to have been created at preview time.

A [module] placeholder would be nice to have, but I have not been able to figure out a good way to do this. Same with a [breadcrumbs] (that would transform the pages breadcrumbs into a path).

The placeholder [categorybreadcrumbs] gives a path with the category and all ancestors to it. The name "categorybreadcrumbs" is clumsy, so suggestions for a better name would be appreciated.

I have also changed the way the module behaves if the path already exists. Now it adds a serial number at the end of the path.

I have not had the opportunity to test the code with sites that has none or just one vocabulary. If you do that, please write a comment and tell if it works.

// $Id: path_automatic_automatic.module,v 0.2 2005/01/24 21:22:26 Tommy Sundström Exp $


/** 
 * This module auto-generates a path alias.
 *
 * It needs path.module in order to work.
 */
 
/**
 * Implementation of hook_help().
 *
 */
function path_automatic_help($section) {
  switch ($section) {
    case 'admin/modules#description':
      // This description is shown in the listing at admin/modules.
      return t('If the path module is active, renames the URLs automaticly.');
  }
}  



/**
  * Settings for how the path will get generated. 
	*/	
function path_automatic_settings() { 
  if (module_exist(path)) {
	  if (user_access('create url aliases') || user_access('administer url aliases')) {
		  $vocabularies = taxonomy_get_vocabularies();
			
		
		
		  // Generate the form
			$description = 'Write the path as you want it, using the following placeholders.';
			$description .= '<dl>';
			
			$description .= '<dt>'.'[title}'.'</dt>';
			$description .= '<dd>'.t('The title of the page, transformed so that it can be used in an url').'</dd>';
			
			if (sizeof($vocabularies) > 0) {
				$description .= '<dt>'.'[category]'.'</dt>';
				$description .= '<dd>'.t('The category that the page belongs to (in case it belongs to several, the first is used).');
				if (sizeof($vocabularies) > 1) {
					$description .= t(' If there are more than one vocabulary you must choose the one to be used with the control below.');
				}
				$description .= '</dd>';
			}							
		
			$description .= '<dt>'.'[categorybreadcrumbs]'.'</dt>';
			$description .= '<dd>'.t('As category, but all supercategories is included in path.').'</dd>';
			
			$description .= '<dt>'.'[year]'.'</dt>';
			$description .= '<dd>'.t('A full numeric representation of a year, 4 digits	Examples: 1999 or 2003').'</dd>';
			
			$description .= '<dt>'.'[month]'.'</dt>';
			$description .= '<dd>'.t('Numeric representation of a month, with leading zeros	01 through 12').'</dd>';
			
			$description .= '<dt>'.'[monthShort]'.'</dt>';
			$description .= '<dd>'.t('A short textual representation of a month, three letters	Jan through Dec').'</dd>';
			
			$description .= '<dt>'.'[day]'.'</dt>';
			$description .= '<dd>'.t('Day of the month, 2 digits with leading zeros	01 to 31').'</dd>';
			
			$description .= '<dt>'.'[dayShort]'.'</dt>';
			$description .= '<dd>'.t('A textual representation of a day, three letters	Mon through Sun').'</dd>';
			
			$description .= '<dt>'.'[hour]'.'</dt>';
			$description .= '<dd>'.t('24-hour format of an hour with leading zeros	00 through 23').'</dd>';
			
			$description .= '<dt>'.'[minute]'.'</dt>';
			$description .= '<dd>'.t('Minutes with leading zeros	00 to 59').'</dd>';
			
			$description .= '<dt>'.'[second]'.'</dt>';
			$description .= '<dd>'.t('Seconds, with leading zeros	00 through 59').'</dd>';
			
			$description .= '<dt>'.'[week]'.'</dt>';
			$description .= '<dd>'.t('Week number of year, weeks starting on Monday.	Example: 42 (the 42nd week in the year)').'</dd>';						
			
			/*
			$description .= '<dt>'.'[]'.'</dt>';
			$description .= '<dd>'.t('').'</dd>';
			*/
			
			$description .= '</dl>';
			
			// Since the node id is not known at preview time, I have not found any way to make a placeholder for it.
			// To make more date formats, see http://se2.php.net/manual/en/function.date.php
			
			$output = form_textfield( 'Pattern for the path', 
										 					  'path_automatic_pathpattern', 
										 					  variable_get('path_automatic_pathpattern','{$title}'), 
										 						100, 
										 						6000, 
										 						$description, 
										 						NULL, 
										 						FALSE);
			
			
			
			
			// Ask about what vocabulary are to be used			
			switch (sizeof($vocabularies)) {
			  case 0: // No vocabulary
				  variable_set('path_automatic_vocabulary', NULL);
				  break;
			  case 1: // Just one vocabulary, nothing to choose between
				  variable_set('path_automatic_vocabulary', $vocabularies[0]->vid);
					break;
				default:
			  	$options = array();
			    $description = '<br /><dl>';
				  foreach($vocabularies as $vocabulary) {
					  $options[$vocabulary->vid] = $vocabulary->name;
						if ($vocabulary->description){
						  $description .= '<dt>'.$vocabulary->name.'</dt>';
				      $description .= '<dd>'.$vocabulary->description.'</dd>';
						}	
					}
					$description .= '</dl>';

					$output .= form_select(t('Choose the vocabulary to be used for {$category}'), 
							'path_automatic_vocabulary', 
							variable_get('path_automatic_vocabulary','0'), 
							$options,
							$description
							);
					break;
				}														
		}
		else {
		  $output = t('You need permission from the path module to <i>create url aliases</i> or <i>administer url aliases</i> for this module to work');
		}
	}
	else {
	  $output = t('This module requires that the path module <a href="/admin/modules">is activated</a>.');
	} 
	return $output;
} 

/**
 * Implementation of hook_nodeapi().
 *
 * Allows URL aliases for nodes to be specified at node edit time rather
 * than through the administrative interface.
 */
function path_automatic_nodeapi(&$node, $op, $arg) {
  if (module_exist(path) && (user_access('create url aliases') || user_access('administer url aliases'))) {
    switch ($op) {
      case 'validate':
			  // (This is not realy validation, but this is the best place to patch in this funtionality.)
			
        $node->path = trim($node->path);
				
				// If no path alias is given, auto generates a path alias.
				if (!$node->path) {
				  // No path. Autogenerating it.
					
					// Generate the placeholders
					$placeholders = array();
					
					// [title]
					$txt = stringToPath($node->title);
					if ($txt == '***NOT_VALID***') {$txt = '';} // The stringToPath has failed. 
					$placeholders['[title]']  = $txt;
					
					// [category] & [categorybreadcrumbs]
					$vid = variable_get('path_automatic_vocabulary', NULL);
					if ($vid && $node->taxonomy) {
					  $termids = $node->taxonomy;
						foreach($termids as $tid) {
						  $term = taxonomy_get_term($tid);
							if ($term->vid == $vid){
							  $termname = $term->name;
								break; // (only the first is needed. $tid will be used again below)
							}
						}						
						$txt = stringToPath($termname);  						
						if ($txt == '***NOT_VALID***') {$txt = '';} // The stringToPath has failed. 					
						$placeholders['[category]']  = $txt;
						
						$crumbstxt = '';
						$ancestors = taxonomy_get_parents_all($tid);
						foreach ($ancestors as $ancestor) {
						  $txt = stringToPath($ancestor->name);
							if ($txt == '***NOT_VALID***') {$txt = '';} // The stringToPath has failed. 
						  $crumbstxt  = $txt . '/' . $crumbstxt;
						}
						$placeholders['[categorybreadcrumbs]']  = $crumbstxt;
					} else {
					  $placeholders['[category]']  = '';
						$placeholders['[categorybreadcrumbs]']  = '';
					}
					
					// Date & other time info
										
					$placeholders['[year]'] 				= date('Y', $node->created);
					$placeholders['[month]'] 				= date('m', $node->created);
					$placeholders['[monthShort]'] 	= strtolower(date('M', $node->created));
					$placeholders['[day]'] 					= date('d', $node->created);
					$placeholders['[dayShort]'] 		= strtolower(date('D', $node->created));
					$placeholders['[hour]'] 				= date('H', $node->created);
					$placeholders['[minute]'] 			= date('i', $node->created);
					$placeholders['[second]'] 			= date('s', $node->created);
					$placeholders['[week]'] 				= date('W', $node->created);
					
					// Construct the path
					$pathpattern = variable_get('path_automatic_pathpattern','{$title}');
					$path = str_replace(array_keys($placeholders), $placeholders, $pathpattern);
					
					// Cleanup path and check that it's unique
					// Two or more slashed should be colapsed into one
        	$path = preg_replace("/\/+/", "/", $path);
        	// Trim any leading or trailing slashes
        	$path = preg_replace("/^\/|\/+$/", "", $path);
					
					// If the path already exists, generate a new variant
					if (path_automatic_check_if_already_in_use($path, "node/$node->nid")) {
							for ($i=1; path_automatic_check_if_already_in_use($path."-".$i, "node/$node->nid"); $i++) {
							}
							$path = $path.'-'.$i;
					}					
					$node->path = $path;																
				}
			}
		}
	}
							
						

/** 
 * Clean a string so that it can be used as a path.
 */

function stringToPath($str) {
  $str = strip_tags($str);
						
	// Accented characters					
	$from = 'ŠŽšžŸÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÑÒÓÔÕÖØÙÚÛÜÝàáâãäåçèéêëìíîïñòóôõöøùúûüýÿ';
	$to   = 'SZszYAAAAAACEEEEIIIINOOOOOOUUUUYaaaaaaceeeeiiiinoooooouuuuyy';
	for ($n=0; $n < strlen($to); $n++) {
		$str=str_replace(utf8_encode(substr($from, $n, 1)), substr($to, $n, 1), $str);
		}
	// Interpunction etc
	$str = preg_replace("/[ \/_]/", "-", $str);  // replace with dash
	$str = preg_replace("/[.,:;!?¤%&(){}@£$€<>|=*[\]\"\'\\\\]/", "", $str); // remove 
	$str = preg_replace("/-+/", "-", $str); // collapse mulitple dashes into one
	// todo: trim away dashes
	$str = trim($str);
	$str = strtolower($str);
	$str = urlencode($str); 
	if (!valid_url($str)) {
		// valid_url will not accept all urls that urlencode can generate (it does not seam to accept the %-encoding)
		// This should be a rare problem, but to be sure, skip path generation when this is the case.
		// (It will only be the case when the title contains other characters than a-z and the ones transformed above.)
		return ('***NOT_VALID***'); 
		}
	return ($str); 
}

 

/** 
 * Check that a path alias is not already in use.
 */
function path_automatic_check_if_already_in_use($aliaspath, $realpath) {
  if (db_result(db_query("SELECT COUNT(dst) FROM {url_alias} WHERE dst = '%s' AND src != '%s'", $aliaspath, $realpath))) {
	  return TRUE; // In use
	} 
	else {
	  return FALSE; // Free to use
	}
}

Steven’s picture

Your accent stripper requires that the file be saved in windows-1252 encoding. If saved as UTF-8, it will be broken.

--
If you have a problem, please search before posting a question.

Tommy Sundstrom’s picture

This probably explains why I had to use the " utf8_encode " in order to make it work.

Unfortunately I have no idea on how to make this utf8-friendly (I've searched the documentation.) Any pointers on how to solve this would be appreciated.

Steven’s picture

To be honest, I don't really see why you need to strip accents at all. You can use UTF-8 characters in IRIs (internationalized resource identifiers), which end up as %-escaped UTF-8 bytes in URIs. Stripping accents is only possible for western languages, and even then it is only barely acceptable for those with only sporadic use of accents. But, most of the accent-using languages do not think of them as accented letters, but as characters in their own right.

Most browsers can deal with IRIs. Check your status bar when hovering over this link (uses %-escapes). Now try it with this one (uses literal characters). Both work fine here (Firefox).

However, back to the code. There are several problems with yours:

- It assumes the source file is encoded in Windows-1252, and that each accented letter occupies only 1 byte. As Drupal uses UTF-8 for its source files, this is a bad assumption. Especially because you posted your code on Drupal.org, which itself uses UTF-8. One way around this would be to use hex escaped bytes in the string (e.g. \xFF), but this is still bad because of the next point.

- utf8_encode() works on ISO-8859-1 data, not Windows-1252. Windows-1252 is the microsoft variant of ISO-8859-1, and contains extra characters in the bytes 0x80 - 0x9F, which the ISOs consider control characters. For example, the 'š' character, which is byte 0x8A in Windows-1252, will be utf8_encoded to codepoint U+8A, which is different from U+160 (the codepoint for 'š').

If you limit yourself to ISO-8859-1 only, then your current method works, provided that you use hex escapes, so your source file is still valid UTF-8. If you want to include all accented latin characters, then you would have to build a map of UTF-8 byte sequences to ASCII characters.

I think all non-ASCII Basic and Extended Latin characters have 2-byte UTF-8 sequences, so you could use this to your advantage (to still store the map as 2 long strings, with the source string twice the length of the target string).

Oh and never, ever rely on PHP.net for Unicode-related information. As far as Unicode goes, PHP is hell. If you don't have a good Unicode font installed to browse all the characters in, google for Everson Mono Unicode.

--
If you have a problem, please search before posting a question.

adamrice’s picture

For some reason, I'm getting the followin error when I try to load admin/modules after installing this:

Parse error: parse error, unexpected '{' in /home/adamrice/public_html/h2/modules/path_automatic_automatic.module on line 13

That line is
  switch ($section) {
Which seems correct and innocuous. Not sure what the problem is. But I look forward to getting it working.

Adam Rice

Tommy Sundstrom’s picture

Strange. I don't get this error (and it's a part of the code that has not changed since previous version).

I see that you have named the module path_automatic_automatic.module. Does it make any difference if you remove the extra _automatic? Seams a bit far-fetched, but it's the best suggestion I can come up with...

adamrice’s picture

If you look at the top-most section of your code, that's how you named it--I was following your lead. I tried taking out the extra _automatic -- no change.

As an experiment, I commented-out that section of the code, and I got the same error on line 26, which shows a similar syntactic pattern--a nested opening { bracket. I don't know why this would be a problem.

Adam Rice

Tommy Sundstrom’s picture

My mistake. It should be named path_automatic.module .

Has anyone else experienced the same problem as Adam?

Tommy Sundstrom’s picture

This module is merging with Autopath. See http://drupal.org/node/16035

mikeryan’s picture

The merged module, pathauto, is now available. Thanks to Tommy for the fine ideas he provided, between us I think we have something which should suit just about anybody's needs. And if not, just open up an issue...

Mike
Fenway Views