Pathauto supports bulk generation of aliases for nodes that are not aliased.

The number of unaliased nodes that will be updated is controlled by the Pathauto setting:
"Maximum number of objects to alias in a bulk update"

Updating via the admin interface

The bulk update is currently a manual action done by:

  • visiting the pathauto settings (/admin/settings/pathauto)
  • ticking the checkbox under "Node Path Settings" for:

    "Bulk generate aliases for nodes that are not aliased"
  • clicking "Save Configuration" button
  • repeating as many times as required to generate all aliases

If you have a lot of nodes to update, this can be a very tedious button clicking process. You can experiment with the Maximum number of objects to alias in a bulk update to find a number that is relatively high and will not time out.

Using the command line to bulk update unaliased nodes

A faster way to update would be from a command line. To do so, I set up a cron-update-pathauto.php script which contains:

// This gets Drupal started.
include_once './webroot/includes/bootstrap.inc';
drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);

// This gets Pathauto started updating aliases.
_pathauto_include() ;
node_pathauto_bulkupdate();

The include paths here assume you'll put this cron-update-pathauto.php file above your top level Drupal directory with appropriate permissions (eg `chmod 500 cron-update-pathauto.php` assuming the file is owned by the user who will be executing it).

Adding the unaliased node update to cron

To enable this to be called from cron I also setup a modified version of the cron script from http://drupal.org/node/65307 as cron-pathauto.sh:

#!/bin/bash
#############################
# CONFIGURATION OPTIONS
#############################

# Set the complete local path to where the cron.php file
# is (ie the root path) Default is /var/www/webroot/
root_path=/var/www/

# Set the complete path to the php parser if
# different from standard
parse=/usr/bin/php

cronjob=cron-update-pathauto.php

##############################
# END OF CONFIGURATION OPTIONS
##############################

cd $root_path

if [ -e "$cronjob" ]
then
  $parse $cronjob
   if [ "$?" -ne "0" ]
    then
     echo "$cronjob not parsed."
   else
     echo "$cronjob has succesfully been parsed."
   fi
else
  echo "$cronjob not found."
  exit
fi

exit

Finally, added the actual cronjob to run this periodically. Depending on how long it takes to run on your site and how quickly you want to build the aliases you could set it to be every 15 minutes or more or less often.

crontab -e
# minute  hour  mday  month  wday  command
*/15          *        *          *          *          /home/htdocs/cron-pathauto.sh >/dev/null 2>&1

Note that this cron job could be left on a site permanently if you are importing nodes and not creating aliases as you import.

Updating via drush

Another way to do this is to install Drush and then run this command:

drush php-eval '_pathauto_include() ; node_pathauto_bulkupdate()'

As shown above, this command could be included in a script called periodically via cron.

Performance tips

Pathauto's speed in Drupal 6 is directly tied to the creation of tokens. In Drupal 6 when a module uses tokens all of the related tokens are calculated and then the tokens that need to be used get used. In Drupal 7 only the tokens being used are calculated.

Disable cck token generation for fields you don't use

If you have nodes with a large number of CCK fields it can be particularly slow to generate tokens (and therefore slow to calculate the Pathauto aliases).

One helpful tip is to figure out which cck fields and prevent the token generation for that field. In your admin interface browse to › Administer › Content management › Content types and click on "manage fields" for each content type. Click the "Display fields" tab at the top of the page and then click the Token sub-tab. The URL should be something like /admin/content/node-type/food/display/token Review the fields on this list and click "Exclude" for any fields which are not used for token generation on your site (note that there are modules other than Pathauto which might use these tokens).

Disable modules that create tokens you don't need

You can review which modules are creating tokens, again using Drush php-eval to print a list of tokens that implement hook_token_values.

greggles@biff:~/d6$ drush php-eval "print_r(module_implements('token_values'));"
Array
(
    [0] => content
    [1] => text
    [2] => hrules
    [3] => og
    [4] => signup
    [5] => single_field_viewer
    [6] => token
    [7] => rules
    [8] => crazy_token_generator_of_doom
)
greggles@biff:~d6$ 

If you see a module in that list which you don't really need on your site then you could disable it and see if it performs any faster. If you need the module, but not the tokens from it, consider working with the module maintainer to make the portions of code related to token optional (e.g. with a variable in the admin interface or by simply moving the token code to a sub-module).

Tweaking command line memory

If you are running this via command line and set the Maximum number of objects to alias in a bulk update to a large number then you are likely to run into PHP's memory limit. If the command dies in the middle it is likely because of the memory limit. Their is a separate php.ini configuration file for command line than the PHP running inside the webserver. On Ubuntu that file is stored in /etc/php5/cli/php.ini. You can modify this file to set the memory_limit parameter to a very high number like 512M to mean 512 Megabytes, or set it to -1 to remove the memory limit.

Comments

munkie’s picture

I've got a better idea:

function mymodule_cron() {
	module_invoke_all('pathauto_bulkupdate');
}
crosenblum’s picture

That currently does not work, and i get missing cache.inc errors.

hanoii’s picture

I wanted to use the script on a site of my own, but I have a multi-site setup, which means the settings are in a different place other than default and you need to trick the script so it can find the settings.php properly.

I took a little bit of coding from drush module, maybe this script should be added as a drush pathauto module?

Anyway:

include_once './includes/bootstrap.inc';
include_once './sites/all/modules/pathauto/pathauto.inc';
include_once './sites/all/modules/pathauto/pathauto_node.inc';

// The URL how you would normaly access your drupal site with a browser
$url = 'http://www.example.com';
$drupal_base_url = parse_url($url);
$_SERVER['HTTP_HOST'] = $drupal_base_url['host'];
$_SERVER['PHP_SELF'] = $drupal_base_url['path'].'/index.php';
$_SERVER['REQUEST_URI'] = $_SERVER['SCRIPT_NAME'] = $_SERVER['PHP_SELF'];
$_SERVER['REMOTE_ADDR'] = NULL;
$_SERVER['REQUEST_METHOD'] = NULL;

drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);

node_pathauto_bulkupdate();
emlomar1’s picture

Great piece of code, just what I was looking for. Thank you!

winzana’s picture

Your Script doesn't work for D7, this script is ok for me :

define('DRUPAL_ROOT', getcwd());

include_once dirname(__FILE__) . '/includes/bootstrap.inc';
include_once dirname(__FILE__) . '/sites/all/modules/pathauto/pathauto.inc';
include_once dirname(__FILE__) . '/sites/all/modules/pathauto/pathauto.pathauto.inc';

// The URL how you would normaly access your drupal site with a browser
$url = 'http://SITE_URL';
$drupal_base_url = parse_url($url);
$_SERVER['HTTP_HOST'] = $drupal_base_url['host'];
$_SERVER['PHP_SELF'] = $drupal_base_url['path'] . '/index.php';
$_SERVER['REQUEST_URI'] = $_SERVER['SCRIPT_NAME'] = $_SERVER['PHP_SELF'];
$_SERVER['REMOTE_ADDR'] = NULL;
$_SERVER['REQUEST_METHOD'] = NULL;

drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);

$context = array();
node_pathauto_bulk_update_batch_process($context);
while($context['sandbox']['count'] < $context['sandbox']['total']){
    node_pathauto_bulk_update_batch_process($context);
}
ccshannon’s picture

The PHP script returns immediately. Nothing happens.

Does anyone know, how does that function "node_pathauto_bulkupdate()" run without any arguments? Does it pull the number from General Settings or does it just run limitless?

Anyway, there has to be a better way (in D6 at least) of bulk updating aliases across sites.

asb’s picture

Hi,

has anyone managed to fix this script for D6? (also still pending as a support request in #505042: Bulk generate path aliases for large sites)

Thanks & greetings, -asb

js’s picture

Hi asb,

The details on this page worked for me. I did adjust the path to pathauto for my installation.

include_once './includes/bootstrap.inc';
include_once './sites/default/modules/pathauto/pathauto.inc';
include_once './sites/default/modules/pathauto/pathauto_node.inc';

drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);

_pathauto_include();
node_pathauto_bulkupdate();

I looped from my Mac which is missing some shell commands, so I used

for i in {1..250}; do echo $i; curl http://domain/script.php; done

to generate the paths. Doing this manually would have been painful.

I refreshed "Delete aliases" page to monitor the progress.

Thanks to all for the help above!

dankohn’s picture

Although my suggestion (below) of using views bulk operations works in most cases, it failed last night with an out of memory error after re-aliasing 25,000 of my 40,000 nodes. But there is no way to start it again from where it left off.

By contrast, here is how to trigger pathauto from the command line, based on js's example. This is faster (I believe) than the VBO option.

First, in the pathauto settings at (/admin/build/path/pathauto), set "Maximum number of objects to alias in a bulk update:" to as high a number as you can that repeatedly works without timing out. I use 400. With 40,000 nodes, that means I need to run this script 100 times. Create a file called pathauto.php with these contents (note the different paths versus js's example):

include_once './includes/bootstrap.inc';
include_once './sites/all/modules/pathauto/pathauto.inc';
include_once './sites/all/modules/pathauto/pathauto_node.inc';

drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);

_pathauto_include();
node_pathauto_bulkupdate();

Now, run the following from a bash shell:

for i in {1..100}; do echo $i; php pathauto.php; done

raffuk’s picture

or

<?php
include_once './includes/bootstrap.inc';
include_once './sites/all/modules/pathauto/pathauto.inc';
include_once './sites/all/modules/pathauto/pathauto_node.inc';

drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);

variable_set('pathauto_max_bulk_update', 100000); // Or how many nodes you think you have :)
	node_pathauto_bulkupdate();

?>
stefanhapper’s picture

If you run into a PHP timeout problem with the above script you can lower the number of new URL aliases to create to something like 100 and then add this code at the end of the script - and don't forget to stop it after a while :)

<html>

	<head>
		<meta http-equiv="refresh" content="5">
	</head>
	
	<body>
		Created new URL aliaes at <strong><?php print date('r') ?></strong> :: Restarting in 5 seconds ...
	</body>

</html>
-enzo-’s picture

Well this is the combination of both ideas

<?php
include_once './includes/bootstrap.inc';
include_once './sites/all/modules/pathauto/pathauto.inc';
include_once './sites/all/modules/pathauto/pathauto_node.inc';

drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);
?>
<html>

<head>
<meta http-equiv="refresh" content="40">
</head>

<body>
Created new URL aliaes at <strong><?php print date('r') ?></strong> :: Restarting in 40 seconds ...
</body>

</html>
<?php 
variable_set('pathauto_max_bulk_update', 500); // Or how many nodes you think you have :)
    node_pathauto_bulkupdate();

?>

At least works for me.

Enjoy It.

enzo

--
enzo - Eduardo Garcia
weKnow - http://www.weknowinc.com
Please use the git author option: --author="enzo " for any patch I did and used in a new module release

jerome72’s picture

Hey this script is very nice! Thank you! I needed to have the option Transliterate prior to creating alias checked, so I added this to the script: variable_set('pathauto_transliterate', TRUE);

Here's my code (with a 10 sec page refresh, and 1000 aliases created per refresh):

<?php
  include_once './includes/bootstrap.inc';
  include_once './sites/all/modules/pathauto/pathauto.inc';
  include_once './sites/all/modules/pathauto/pathauto_node.inc';

  drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);
?>

<html>

<head>
<meta http-equiv="refresh" content="10">
</head>

<body>
Created new URL aliases at <strong><?php print date('r') ?></strong> :: Restarting in 10 seconds ...
</body>

</html>
<?php
  variable_set('pathauto_max_bulk_update', 1000); // Or how many nodes you think you have :)
  variable_set('pathauto_transliterate', TRUE);
  node_pathauto_bulkupdate();
?>
kbk’s picture

To clarify, the line below sets up how many nodes to alias in a given run and isn't really related to how many nodes you have.

variable_set('pathauto_max_bulk_update', 500); // Or how many nodes you think you have :)
    node_pathauto_bulkupdate();

From what I can tell, this will script will run as long as you have a web browser pointed at http://yoursite.com/pathauto.php and you will need to monitor the aliasing progress and remove the script when aliasing is complete. Your drupal site should flash this message when aliasing is complete:

Bulk generation of nodes completed, 0 aliases generated.

newswatch’s picture

This is a fantastic way out. Works just fine :)

Cheers.

-----------------------------
Subir Ghosh
www.subirghosh.in

dankohn’s picture

The best way to update pathauto aliases on Drupal 6 is to use the incredibly powerful Views Bulk Operations. Install it, then create a view of all of your nodes with style VBO. The key step is that VBO supports the Batch API, which allows Drupal to do thousands of tasks without timing out, and with a nice status page showing you its progress. Anyway, the pathauto option is near the bottom. You need to create separate views for terms and users as well.

asb’s picture

> The best way to update pathauto aliases on Drupal 6 is to use the incredibly powerful Views Bulk
> Operations. Install it, then create a view of all of your nodes with style VBO. [...] You need to create
> separate views for terms and users as well.

Using VBO is a really great idea. However, I wasn't yet able to build a view to bulk update all term and vocabulary aliases (I don't want to update node paths). Any ideas?

Thanks, -asb

dankohn’s picture

Yep, VBO doesn't support the pathauto function for taxonomy terms. So, I would use this solution instead.

Anonymous’s picture

node_pathauto_bulkupdate() updates non-aliased nodes, but what if you need to refresh all nodes? I.e. create new aliases, replace existing on cron runs?

ahabman’s picture

Are you referring to pathauto_node.inc (~line#100) where the query does a "WHERE alias.src IS NULL"? I'd also like to know if this is an actual limitation and how to get around it safely.

Sylvain_G’s picture

with drush you can do
#drush sql-query "TRUNCATE {url_alias}" to delete all url alias

btw the various scripts does not works

drush php-eval "module_load_include('inc', 'pathauto'); module_load_include('inc', 'pathauto', 'pathauto_node'); node_pathauto_bulkupdate()"

Fatal error: Call to undefined function node_pathauto_bulkupdate() in /usr/local/bin/drush3/commands/core/core.drush.inc(610) : eval()'d code on line 1
Drush command could not be completed.

i'm running pathauto-6.x-1.5"

Anyone got a solution?

--
Open is Better

gausarts’s picture

On my installation profile, with pathauto 2, I want to bulk alias all term menus, and use below at the end of profile tasks along with other useful functions menu_rebuild, drupal_cron_run, etc:

<?php
  module_load_include('inc', 'pathauto', 'pathauto'); 
  module_load_include('inc', 'pathauto', 'pathauto.pathauto');  
  taxonomy_pathauto_bulk_update_batch_process();
?>

I don't do bootstrap, and aliases are bulk updated.

love, light n laughter

yecarrillo’s picture

This script will create users and taxonomy alias

include_once './includes/bootstrap.inc';
include_once './sites/all/modules/pathauto/pathauto.inc';
include_once './sites/all/modules/pathauto/pathauto_node.inc';
include_once './sites/all/modules/pathauto/pathauto_user.inc';
include_once './sites/all/modules/pathauto/pathauto_taxonomy.inc';

drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);
_pathauto_include();

node_pathauto_bulkupdate();
user_pathauto_bulkupdate();
taxonomy_pathauto_bulkupdate();
blog_pathauto_bulkupdate();
tracker_pathauto_bulkupdate();
contact_pathauto_bulkupdate();
Gabriel R.’s picture

Thanks for the script, I improved on it by adding this bit to make it call itself continuously:

$batch = (int) $_GET['batch'] + 100;
print $batch . "<br />"; // Just so we have an idea of how many nodes were parsed.
print '<script type="text/javascript">window.location.href="?batch='.$batch.'";</script>';

But is there a way to get to the results in $messages? It would be great to be able to show these here.

zazinteractive’s picture

Worked great. Thanks for helping us out

tfo’s picture

To speed up a node_save()-based content migration I recently worked on, I disabled Pathauto during import. Afterward, I tried both the scripted node_pathauto_bulkupdate() version as well as the drush version of updating url_alias. Unfortunately, afterward, Global Redirect isn't working, and it's possible to visit the /node form of URLs as well as the full path URLs. Why would the manual bulk update process not allow Global Redirect to work? Is there a table other than url_alias involved?

asb’s picture

See #825006: It removes aliases. Have lots of fun fun reading it. This doesn't just apply for users of the 'Scanner' module.

dman’s picture

drush php-eval " _pathauto_include() ; node_pathauto_bulkupdate(); "
micropony’s picture

The drush command looked a little stale so I thought I'd try this for bulk updating all node aliases:

drush php-eval " module_load_include('inc', 'pathauto', 'pathauto'); module_load_include('inc', 'pathauto', 'pathauto.pathauto'); node_pathauto_bulk_update_batch_process(); "

However, I'm getting this error:

Missing argument 1 for node_pathauto_bulk_update_batch_process(), called in/usr/local/Cellar/drush/4.5/commands/core/core.drush.inc(637) : eval()'d code on line 1 and defined pathauto.pathauto.inc:86

I'm dealing with ~9000 node aliases that need updating, and the in-browser bulk update doesn't work (of course), any thoughts on how I can clean this up?

Eugene Fidelin’s picture

Here is my script for bulk updating node aliases
It works good during update process, but it can't work with more that several thousands nodes

  //regenerate aliases for nodes and taxonomies
  module_load_include('inc', 'pathauto');
  module_load_include('inc', 'pathauto.pathauto');

  //delete all node aliases
  db_query('DELETE FROM {url_alias} WHERE src LIKE "node/%"');

  //get all nodes that need to be updated
  $concat = _pathauto_sql_concat("'node/'", 'n.nid');
  $query = db_query("SELECT n.nid FROM {node} n LEFT JOIN {url_alias} ua ON $concat = ua.src WHERE ua.src IS NULL");
  $nids = array();
  while ($nid = db_result($query)) {
    $nids[] = $nid;
  }

  pathauto_node_update_alias_multiple($nids, 'bulkupdate');
micropony’s picture

Hey Eugene, I think that would be fine if I didn't need pathauto to play nicely with the Redirect module, which would automatically create redirects (massively important for SEO) when URL aliases are changed.

It's important for me not to wipe the old URL aliases, which yr script is doing.

It did give me an idea though. I'm going to stuff all the current URL aliases into Redirect's redirect.source table in the db. Then I'll try and collect the output of pathauto_node_update_alias_multiple() into an array and stuff it into Redirect's redirect.redirect table.

(I sense a module in the works B-) )

jooplaan’s picture

Great idea.. I was looking for a solution for the same.

Did you create a custom module for this? Can you share the code you used?

colan’s picture

All: Rather than discussing the Drush command code here, please help with #867578: Add drush commands for bulk alias updating/deleting instead. Directing our energy over there will get this done faster. Thanks.