Add reasonable fallback to cron_semaphore

voxpelli - May 6, 2009 - 12:31
Project:FeedAPI
Version:6.x-1.x-dev
Component:Code feedapi (core module)
Category:feature request
Priority:normal
Assigned:Unassigned
Status:closed
Description

I'm currently working with some custom cron code that runs cron jobs in parallel and that didn't, not surprisingly, work very well with the FeedAPI.

Looking at the time estimating code in the FeedAPI I see that you first make an estimate based on the max_execution_time and feedapi_cron_percentage and then you go to the cron_semaphore variable, which would hold the starting time of a traditional cron job, and adds the max_execution_time to it and then chooses the lowest of the two - either the maximum time for the feedapi or the maximum time of the entire cron job.

If cron_semaphore doesn't contain a sensible timestamp you're fallbacking to zero. I think you instead should ignore cron_semaphore if it doesn't exist because that should be a clear signal that someone doesn't want you to take that into account. A fallback to zero is essentially the same as not running the cron job when cron_semaphore isn't set - and there's no need to not run it in such a situation as far as I can see.

I'm attaching a patch with one possible solution to this.

AttachmentSize
feedapi_cron_semaphore.patch681 bytes

#1

Aron Novak - June 29, 2009 - 10:15
Title:Add reasonable fallback to cron_semaphore» Run feed refreshes in a parelell way
Status:needs review» fixed

For paralell execution of feed refresh, i suggest you to use drush and feedapi together. This is a tested solution and it works reliably.
You also need a shell script

#! /bin/bash
#
# File:   feedapi.sh
# Author: Aron Novak
# Contact: aron@novaak.net

# begin config

# if all the feeds are refreshed and min_iteration does not elapsed from the beginning of the iteration
# wait until then
min_iteration=900

# start to refresh $paralell feed at one round
paralell=4

## end config, do not modify below unless you know what to do

# check for sane parameter
if [ -z $1 ]; then
  echo "Usage: ./feedapi.sh /path/to/drupal_instance/"
  exit 1
fi
cd $1
command_avail=`drush  | grep "feedapi refresh" | wc -l`
if [ ! $command_avail -eq 1 ]; then
  echo "The directory is not a Drupal instance or FeedAPI Drush commands are not available."
  exit 1
fi

# become a daemon
if [ "x$2" != "x--" ]; then
  $0 $1 -- 1> /dev/null 2> /dev/null &
  exit 0
fi

while [ 1 ]
  do
  start=`date +%s`
  i=0
  for nid in `echo "SELECT f.nid FROM feedapi f NATURAL JOIN node n WHERE n.status = 1" |drush sql cli| grep ^[0-9]`
    do
    already=`ps aux | grep drush | grep $nid`
    if [ -z "$already" ]; then
      drush feedapi refresh $nid 2> /dev/null 1> /dev/null &
      pids[$i]=$!
      i=`expr $i + 1`
    fi
    if [ $i -eq $paralell ]; then
      end=`expr $paralell - 1`
      for p in `seq 0 $end`
        do
        wait ${pids[$p]}
      done;
      i=0
    fi
  done;
  end=`date +%s`
  elapsed=`expr $end - $start`
  if [ $elapsed -lt $min_iteration ]; then
    sleep `expr $min_iteration - $elapsed`
  fi
done;

#2

voxpelli - June 29, 2009 - 11:08
Title:Run feed refreshes in a parelell way» Add reasonable fallback to cron_semaphore
Status:fixed» needs review

I'm sorry but I think you've misunderstood me - it's the cron jobs that I'm running in parallel - not the refreshing of feeds.

The core issue for me is that the fallback for the variable "cron_semaphore" isn't sensible - my patch makes it sensible.

Since I think you've misunderstood my issue I'm resetting it back to the old issue - please take a look at my patch to see my point or ask me any questions you might have.

#3

Aron Novak - June 29, 2009 - 11:39
Status:needs review» reviewed & tested by the community

Hm, you're right. I just missed your original purpose, sorry.
Otherwise the patch seems to be fine.

#4

Plazmus - June 30, 2009 - 09:50

@Aron I'm glad that you missed it as it is probably the way I should go with my problem posted here #359184: How to split the feed updates in multiple cron jobs to decrease server load Thanks!

#5

alex_b - July 2, 2009 - 15:16

RTBC from my point of view, too. Should go into next release.

#6

Aron Novak - July 2, 2009 - 16:25
Status:reviewed & tested by the community» fixed

committed.

#7

System Message - July 16, 2009 - 16:30
Status:fixed» closed

Automatically closed -- issue fixed for 2 weeks with no activity.

 
 

Drupal is a registered trademark of Dries Buytaert.