I schedule an article and then submit it. Pathauto module makes the URL path for the article and it's ok - the form is "http://site/category/subcategory/nid/title"

Then the node gets published via the Scheduler and the path changes! The new path is now "http://site/nid/title", so I lose the category path from the URL alias completely. For some reason running the node_save() on the scheduler_cron() causes this to happen. I know the same function gets called when the node is submitted for the first time and the $node->path itself DOES contain the right path when it's handled in scheduler_cron().

I'm not quite sure why the node_save() is called anyway. Why just not make a database query that sets the status directly?

Another thing I was wondering of is that it would be great, if the publish_on timestamp would be set as the creation time of the node. Or at least make an option for it. We've got journalists who make articles today and want to set them to be published in two days. When the article comes published, it shows that it was made two days ago, not the time when it was published.

In case you wanna see how I've done this, here's the whole function as I have it now (ps. $node = node_load($node->nid) replaces the $node which is the database object, so the database object's things can't be used after - that's why I named the database object as $scheduled_node. I didn't do anything for the unpublishing option, though.

With the alternations below the path remains unchanged and when the node is published, it's date is set as the time of the publication.

/**
 * Implementation of hook_cron().
 */
function scheduler_cron() {
  $clear_cache = FALSE;
  
  //if the time now is greater than the time to publish a node, publish it
  $nodes = db_query('SELECT *, (publish_on - timezone) AS utc_publish_on FROM {scheduler} s LEFT JOIN {node} n ON s.nid = n.nid WHERE n.status = 0 AND s.publish_on > 0 AND s.publish_on < %d + s.timezone', time());
  
  while ($scheduled_node = db_fetch_object($nodes)) {
    $node = node_load($scheduled_node->nid);

	//Set the status to published and the timestamps to match the publishing time
	db_query('UPDATE {node} SET status = 1, changed = %d, created = %d WHERE nid = %d', $scheduled_node->utc_publish_on, $scheduled_node->utc_publish_on, $node->nid);

    //if this node is not to be unpublished, then we can delete the record
    if ($scheduled_node->unpublish_on == 0) {
      db_query('DELETE FROM {scheduler} WHERE nid = %d', $node->nid);
    }
    //we need to unpublish this node at some time so clear the publish on since it's been published
    else {
      db_query('UPDATE {scheduler} SET publish_on = 0 WHERE nid = %d', $node->nid);
    }
    
    //invoke scheduler API
    _scheduler_scheduler_api($node, 'publish');
    
    watchdog('content', t('%type: scheduled publishing of %title.', array('%type' => theme('placeholder', t($node->type)), '%title' => theme('placeholder', $node->title))), WATCHDOG_NOTICE, l(t('view'), 'node/'. $node->nid));
    $clear_cache = TRUE;
  }
  
  //if the time is greater than the time to unpublish a node, unpublish it
  $nodes = db_query('SELECT *, (unpublish_on - timezone) AS utc_unpublish_on FROM {scheduler} s LEFT JOIN {node} n ON s.nid = n.nid WHERE n.status = 1 AND s.unpublish_on > 0 AND s.unpublish_on < %d + s.timezone', time());
  
  while ($node = db_fetch_object($nodes)) {
    //if this node is to be unpublished, we can update the node and remove the record since it can't be republished
    $node = node_load($node->nid);
    $node->changed = $node->utc_publish_on;
    $node->status = 0;
    node_save($node);

    db_query('DELETE FROM {scheduler} WHERE nid = %d', $node->nid);
    
    //invoke scheduler API
    _scheduler_scheduler_api($node, 'unpublish');
    
    watchdog('content', t('%type: scheduled unpublishing of %title.', array('%type' => theme('placeholder', t($node->type)), '%title' => theme('placeholder', $node->title))), WATCHDOG_NOTICE, l(t('view'), 'node/'. $node->nid));
    $clear_cache = TRUE;
  }
  
  if ($clear_cache) {
    // clear the cache so an anonymous poster can see the node being published or unpublished
    cache_clear_all();
  }
}

Comments

AjK’s picture

Status: Active » Closed (works as designed)

I'm not quite sure why the node_save() is called anyway. Why just not make a database query that sets the status directly?

This is how it used to work until someone pointed out that no other module gets a chance to hook into a change of state if you update the database directly. Using node_save() is the correct way to do it otherwise why have an API at all? It would just be a database free for all with modules not having a clue what other modules do.

If a node_load() / alter the status / node_save() changes things other than the status then this isnt a problem with Scheduler. Something else is not using the API correctly. So I'd ask "why does Pathauto not handle node_load() / node_save() properly?". That's the right place to fix it, not workaround it everywhere else it causes a problem.