Add reasonable fallback to cron_semaphore
| Project: | FeedAPI |
| Version: | 6.x-1.x-dev |
| Component: | Code feedapi (core module) |
| Category: | feature request |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | closed |
Jump to:
I'm currently working with some custom cron code that runs cron jobs in parallel and that didn't, not surprisingly, work very well with the FeedAPI.
Looking at the time estimating code in the FeedAPI I see that you first make an estimate based on the max_execution_time and feedapi_cron_percentage and then you go to the cron_semaphore variable, which would hold the starting time of a traditional cron job, and adds the max_execution_time to it and then chooses the lowest of the two - either the maximum time for the feedapi or the maximum time of the entire cron job.
If cron_semaphore doesn't contain a sensible timestamp you're fallbacking to zero. I think you instead should ignore cron_semaphore if it doesn't exist because that should be a clear signal that someone doesn't want you to take that into account. A fallback to zero is essentially the same as not running the cron job when cron_semaphore isn't set - and there's no need to not run it in such a situation as far as I can see.
I'm attaching a patch with one possible solution to this.
| Attachment | Size |
|---|---|
| feedapi_cron_semaphore.patch | 681 bytes |

#1
For paralell execution of feed refresh, i suggest you to use drush and feedapi together. This is a tested solution and it works reliably.
You also need a shell script
#! /bin/bash
#
# File: feedapi.sh
# Author: Aron Novak
# Contact: aron@novaak.net
# begin config
# if all the feeds are refreshed and min_iteration does not elapsed from the beginning of the iteration
# wait until then
min_iteration=900
# start to refresh $paralell feed at one round
paralell=4
## end config, do not modify below unless you know what to do
# check for sane parameter
if [ -z $1 ]; then
echo "Usage: ./feedapi.sh /path/to/drupal_instance/"
exit 1
fi
cd $1
command_avail=`drush | grep "feedapi refresh" | wc -l`
if [ ! $command_avail -eq 1 ]; then
echo "The directory is not a Drupal instance or FeedAPI Drush commands are not available."
exit 1
fi
# become a daemon
if [ "x$2" != "x--" ]; then
$0 $1 -- 1> /dev/null 2> /dev/null &
exit 0
fi
while [ 1 ]
do
start=`date +%s`
i=0
for nid in `echo "SELECT f.nid FROM feedapi f NATURAL JOIN node n WHERE n.status = 1" |drush sql cli| grep ^[0-9]`
do
already=`ps aux | grep drush | grep $nid`
if [ -z "$already" ]; then
drush feedapi refresh $nid 2> /dev/null 1> /dev/null &
pids[$i]=$!
i=`expr $i + 1`
fi
if [ $i -eq $paralell ]; then
end=`expr $paralell - 1`
for p in `seq 0 $end`
do
wait ${pids[$p]}
done;
i=0
fi
done;
end=`date +%s`
elapsed=`expr $end - $start`
if [ $elapsed -lt $min_iteration ]; then
sleep `expr $min_iteration - $elapsed`
fi
done;
#2
I'm sorry but I think you've misunderstood me - it's the cron jobs that I'm running in parallel - not the refreshing of feeds.
The core issue for me is that the fallback for the variable "cron_semaphore" isn't sensible - my patch makes it sensible.
Since I think you've misunderstood my issue I'm resetting it back to the old issue - please take a look at my patch to see my point or ask me any questions you might have.
#3
Hm, you're right. I just missed your original purpose, sorry.
Otherwise the patch seems to be fine.
#4
@Aron I'm glad that you missed it as it is probably the way I should go with my problem posted here #359184: How to split the feed updates in multiple cron jobs to decrease server load Thanks!
#5
RTBC from my point of view, too. Should go into next release.
#6
committed.
#7
Automatically closed -- issue fixed for 2 weeks with no activity.