Community Documentation

Cron script for multi source, multi site setup

Last updated March 22, 2008. Created by asciikewl on December 17, 2007.
Edited by add1sun. Log in to edit this page.

I've just hacked up this little script as I have multiple installations of drupal each with multiple sites that get more each day. This should figure out which sites cron should run for.

#!/bin/bash

SITESROOT=/var/www/sites
MYIPRANGE=41.204.221

# get the base installs
cd $SITESROOT
for drupaldir in $(find . -maxdepth 2 -name INSTALL.mysql.txt |  awk -F/ '{print $2}')
do
  cd $SITESROOT/$drupaldir/sites
  for site in $(find  -L . -maxdepth 1 -type d  -iregex "./[a-z].*\.[a-z].*" | awk -F/ '{print $2}')
  do
    IP=$(dig $site  | sed '/AUTHORITY SECTION/,$d' | grep -v "^;" | grep  "IN[[:space:]]*A" | sed 's/.*A\W*//' )
    if echo $IP | grep -q $MYIPRANGE
    then
      #echo "Doing cron for $site"
      wget -O - -q http://$site/cron.php
    else
      a=1
      #echo "Skipping cron for $site"
    fi
  done
done

Comments

Why did you add the IP-range

Why did you add the IP-range that you use? Is it because every Drupal-site (domain) in your multisite setup uses it's own unique IP adress?

If I have only 1 IP adress for all the sites in the multi-site setup, what do I enter in the IP-range line? The last 3 digits of the IP adress or the entire IP adress?

Why the IP?

Because we develop some sites on this server that subsequently gets moved to other servers and I only want to run cron for the ones that should still be active on this server.

A more simple bash script

The script above works, but is a little over-engineered for an environment where you mostly trust the contents of your 'sites' directory to be accurate. It also depends on 'dig' which is not available on every server (I tend to not install it on mine).

This version of the script does a few things differently.

  • It assumes that all directories in 'sites', with the exception of 'default' and 'all', are accurate domain names. I consider this a safe assumption, because a site would never work in a Drupal multi-site if the directories were not named after the domain name of the site.
  • It statically handles the 'sites/default' case.
  • It uses ping, instead of dig, to do a comparison of the IP that you set to the domain in the sites directory. If the two IPs match, the script will run cron.php. This should work even if your ISP or router squelches pings - since we are looking for a successful lookup and not a successful ping.

The one case where I'm unsure how the script will behave looks like:

The domain (foo.example.com) is pointing at your server and you have a directory by the name 'foo.example.com' in your sites directory, and you have not configured apache to point at the drupal multi-site installation for incoming foo.example.com requests. This should not stop the others from running, I give the warning because I have not tested this case. Expected cases work, and domains not pointed at the server do not work.

Put this in a file with a .sh extension, and call that from your crontab.

#!/bin/bash

# This script will iterate through the sites directory of a multi-site install
# and run the cron.php for each named site in the directory.
# NOTE: the site defined in 'sites/default' must have its URL set statically here.

# set the domain of the site defined in 'sites/default'
# comment these lines out if you don't need or use them.
# DEFAULTDOMAIN=www.example.com
# wget -O - -q http://$DEFAULTDOMAIN/cron.php

# set the system path for the multi-site sites directory
SITESROOT=/var/www/drupal-5.10/sites
# set the IP of your server
MYIPRANGE=192.168.1.101

cd $SITESROOT # work in the right dir
for site in $(ls |egrep -v "all|default")
do
  if ping -c 1 $site |grep -q $MYIPRANGE
  then
    wget -O - -q http://$site/cron.php  else
  fi
done

Video Walkthrough

I just want to give a shout-out to gnat's solution. It worked great for me! Just be sure to change value of MYIPRANGE to your server's IP address -- after several moments of mucking around with the code and trying to get my cron job to run, I realized I hadn't entered in my own IP address.

I found ngat's solution from a helpful video by Matt Petrowsky. The video is located here and gives a good overview of cron for new users: http://gotdrupal.com/videos/setting-up-drupal-cron

Also useful for multi-server setup

Thanks for the reference to my video.

One thing I've done to update this script it to make it work only on the current server.

Since I use a version controlled multi-site setup, I need the script to automatically account for whatever server it is on.

This is what I changed MYIPRANGE to

MYIPRANGE=$(/sbin/ifconfig | /bin/sed -n -e 's/:127\.0\.0\.1 //g' -e 's/ *inet addr:\([0-9.]\+\).*/\1/gp')

This will effectively get all the valid IPs listed on all interfaces on the machine. If you're on something other than a linux variant, you'll likely need to change inet addr: in the above sed execution to what is matched by what your ifconfig output is.

Hope this helps those who are running multi-server, multi-site setups when using version control.

-------------------------------------------------------------------
Helping to contribute - by sharing what I know.
Drupal Videos at GotDrupal.com

Script hanging w/out finish

When I run this script from crontab, it doesn't finish.

I went to a bash irc room and they said this about the script:

Please never parse, pipe, grep, capture, read, or loop over the output of 'ls' or 'find'. Despite popular belief, 'ls' is not designed to enumerate files or parse their statistics. Using 'ls' this way is dangerous (word splitting) and there's always a better way; eg. globs, find -exec, etc.

Is the script not finishing because of the ls command? thanks

This works for VHosts with static IPs

Under Mac OS X Leopard server I have my Apache's virtual hosts set up with static IP addresses (for SSL certs, etc.), so setting the MYIPRANGE variable to one IP address failed for me. Here is my adjusted script (based on gnat's) that works for the IP range of the domains and uses curl, since wget isn't on OS X by default.

#!/bin/bash

# This script will iterate through the sites directory of a multi-site install
# and run the cron.php for each named site in the directory.
# NOTE: the site defined in 'sites/default' must have its URL set statically here.

# set the domain of the site defined in 'sites/default'
# comment these lines out if you don't need or use them.
# DEFAULTDOMAIN=www.example.com
# curl --silent --compressed http://$DEFAULTDOMAIN/cron.php

# IP range of vhost domains on server
MYIPRANGE="192\.168\.1\.1[10-22]"

# set the system path for the multi-site sites directory
SITESROOT=/home/drupal/html/sites

cd $SITESROOT # work in the right dir
for site in $(ls | egrep -v "all|default"); do
    # get IP of your vhost domain, if need be - comment MYIPRANGE above if used
    #MYIPRANGE=$(host $site | egrep -o "[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$")
    # verify ping
    if ping -c 1 $site | grep -q $MYIPRANGE; then
        # run cron.php for vhost domain
        curl --silent --compressed "http://$site/cron.php"
    fi
done

And here is the 'localhost.user.drupalcron.plist' for launchd to run the 'drupalcron.sh' script above at an interval (in seconds) when placed in /Library/LaunchDaemons. Cron is replaced by launchd on OS X server, though cron is still available.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>localhost.user.drupalcron</string>
    <key>ProgramArguments</key>
    <array>
        <string>/home/drupal/drupalcron.sh</string>
    </array>
    <key>LowPriorityIO</key>
    <true/>
    <key>Nice</key>
    <integer>1</integer>
    <key>StartInterval</key>
    <integer>86400</integer>
</dict>
</plist>

Then load the .plist job on demand with the following command in Terminal, or reboot...

sudo launchctl load -w /Library/LaunchDaemons/localhost.user.drupalcron.plist