I have started work on creating my own cron initializer, but it is very largely a hack of this existing code. Instead of me creating a new mod that is largely a work of poormanscron, it would be a simple update for yourselves and you could take credit and update an already great mod.

Instead of using the hook_exit, I am using the hook_nodeapi and testing for the submit $op and running cron if an authorised update is made (ie, NOT pending approval) - although I have no support at the moment for moderated cron runs (or at least it might work, just not tested).

Anyway, if you incorporate this feature there is no need for my mod.

regards

Dave

Comments

robloach’s picture

Version: 5.x-1.1 » 7.x-2.x-dev

You're running the cron job in hook_nodeapi?

  1. What conditions would this be run?
  2. How would it look in the user administration interface?
  3. Would it be run when content is submitted, saved, inserted, etc?
  4. What about taxonomy?
  5. It seems to me that running the cron job when a node is updated would bring a lot of overhead to the system. Not only would the cron run every hour or so, it would also run whenever someone updated any content. This would really slow down the system....

madivad’s picture

1. and 3.

Instead of using the hook_exit, I am using the hook_nodeapi and testing for the submit $op and running cron if an authorised update is made...

Basically I was looking to action only when someone submits something new, when you update a page, it also fires the submit (I will retest this, but I'm pretty sure thats what I found). AFAIK, when something is 'saved' or 'inserted' it is actually just submitted, isn't it? Could you give me an example to look at so I can see the difference?

2. User administration would not need to change, I wouldn't think. All poormanscron options suit this concept as well.

4. What about it? I wouldn't think there would be any impact due to taxonomy, unless creating new taxonomy on-the-fly creates a "submit" condition... Something for me to look at. But on that, read on...

The idea is that is does run when content is updated (unless you are on a large volume site, this wouldn't be a problem - and on that, maybe that could be an additional option in the administration section). But the way poormanscron is now, it can also run every hour, or more often. The idea on the "submit" is to have the search index updated to reflect recent changes. But there is no need to update after EVERY submit. There could still be an {insert timelimit} condition... This does introduce the problem however, that if you submit a page and then submit a second page the second page would never be indexed until a new page was submitted after the time limit. This could be overcome by creating a flag that reflects new "submitted" content, but not "cron'd" or indexed. If the time limit expires and someone then 'views' a page, the cron could run (which is the way I believe it is presently, ie on the 'viewing' of a page).

I think my point has been lost in the conveying of the idea. The way I see poormanscron is that it runs every {time limit} that pages are simply viewed. My proposal would see the cron being run a lot less, not a lot more.

hakimapia’s picture

Hi,

Can I have a copy your mod?
I really need that so that newly added or modified contents is automatically indexed for my website.

Thanks.

gpk’s picture

Status: Active » Closed (won't fix)

@#2, #3:
As you know in Drupal the idea behind the cron hook is that it allows modules to define "maintenance tasks" that are performed periodically at a frequency set up by the site admin. Examples of such tasks include trimming stale data from some tables, refreshing aggregator feeds, and updating search indexes (which may include indexes for things that are not nodes). What you want to do is update just the node search index immediately a node is submitted (and perhaps also when a comment is submitted). Firing all the cron hooks when a node is submitted isn't the best way of going about this and IMO wouldn't be an appropriate mod to poormanscron, which only exists to provide a way of actually running cron without having to configure a cron job on the server.

A better way of doing this would be to use a tiny custom module to invoke http://api.drupal.org/api/function/node_update_index from the nodeapi hook (or even invoke http://api.drupal.org/api/function/_node_index_node on the node in question). Another approach would be to use Actions/Triggers to fire a custom action on node/comment submit.

Ariesto’s picture

Great solution gpk. I'm in the process of learning how to use custom modules with drupal. If you have some time, what is the exact php syntax to invoke the node_update_index function? I plan on hardwiring the php call into a custom php rule that fires after new content is published. I'll keep on looking for an example of invoking other modules that I can imitate. Thanks.

gpk’s picture

The exact PHP syntax would be just node_update_index() (see the link above: no arguments are shown). This would cause the entire search index for nodes to be updated. Probably there would only be the one node to do, but possibly it might take a few seconds to run if other updates are needed for some reason.

A better way might be to use _node_index_node($object) where $object is the node you have just created/edited and is passed in to your custom action by the trigger/actions system (you will have set the action's 'type' = 'node').

HTH,

Ariesto’s picture

Priority: Normal » Minor

Thanks for your help gpk. I successfully wired the node_update_index() function to the action module. I don't have time right now, but hopefully sometime in the future I'll do the more precise solution you suggested above.

gpk’s picture

For reference, here is the code I am currently using to force nodes to be indexed immediately on node creation/update:

immediate_index.module

/**
 * Implementation of hook_action_info()
 */
function immediate_index_action_info() {
  return array(
    'immediate_index_index_node_action' => array(
      'description' => t('Index post'),
      'type' => 'node',
      'configurable' => FALSE,
      'hooks' => array(
        // We can't use the 'presave' $op since the action of indexing a node
        // retrieves the node from the database, i.e. the action needs to take
        // place after saving the node.
        'nodeapi' => array('insert', 'update'),
        // Could in princple add any/all of the hook_comment() actions, i.e.
        // insert, update, delete, publish, unpublish.
      )
    )
  );
}

/**
 * Implementation of a Drupal action.
 * Indexes a node.
 *
 * This actually gets invoked via the trigger_nodeapi (and potentially
 * trigger_comment) hook implementation, i.e. after search_nodeapi has
 * marked the node as needing to be reindexed. 
 */
function immediate_index_index_node_action($node, $context = array()) {
  // If search.module is not enabled then do nothing. Could alternatively
  // put this functionality in a separate module with search as a dependency.
  if (module_exists('search')) {
    // Need to reset the static node cache!
    node_load(NULL, NULL, TRUE);
    global $user;
    // Save current user
    $current_user = $user;
    // Carry out indexing as anonymous user, for compatibility with CCK content
    // permissions module.
    // Avoid logging the user out in the event something goes wrong (e.g. a
    // drupal_goto() in a PHP node).
    session_save_session(FALSE);
    $user = drupal_anonymous_user();
    _node_index_node($node);
    watchdog('action', 'Indexed @type %title.', array('@type' => node_get_types('name', $node), '%title' => $node->title));
    // This is an essential final step of the indexing process.
    search_update_totals();
    // Restore current user.
    $user = $current_user;
    session_save_session(TRUE);
  }
}

This would also need a .info file of course, which I leave as an exercise for the reader ;-)

Also note that there is a risk (albeit tiny) of typos in the code above because I've renamed the module and therefore have slightly edited the code from what I have running live.