Last updated February 13, 2014. Created by mathieso on August 20, 2013.
Edited by quietone. Log in to edit this page.

The goal is to create a demonstration site that is automatically restored to its initial state every so often. Restoration is done automatically, without any manual intervention.

How?

Please, please, for the LOVE OF DRUPAL, if you're a Unix wizard, PLEASE CHECK THIS! Your karma bucket will overflow with karma juice. Tasty, tangy, sweet smelling karma juice. Ahhhhh!

Goal

The goal is to create a demonstration site that is automatically restored to its initial state every so often. Restoration is done automatically, without any manual intervention.

Assumptions

  • *nix on the server
  • Command shell access
  • MySQL on localhost
  • User account that has permission to create directories, run cron, and other things, and is in a group with the Apache user.
  • Drush is installed
  • Patience, and a sense of humor. Not essential, but they help.
  • Great caution. The mammal who started this page is not a Unix wizard. The procedure works for him/her/it. YMMV.

Some tutorials. Understanding a little *nix is essential if you want to do this kind of work. And, as the yogi said:

Knowing you don't know is the first step towards the karma fountain. Ah, tangy karma juice. The Red Bull of the soul, you know?

Background

There are two types o' data chunks in a Drupal installation: (1) files, and (2) the database.

Files

A typical Drupal site has thousands of files. Let's see how they are arranged, starting at the directory that contains the Drupal root. We'll call this the "project directory."

<Note>

The project directory is not the one that contains index.php, install.php, the sites directory, and such. That's the Drupal root. The project contains the Drupal root. The project directory contains the directory that contains index.php, install.php, the sites directory, and such.

Brain
Och aye, I have it noo, laddie.

</Note>

The project directory contains not only the Drupal root, but also other files that are part of the project, like the private file system, shell scripts, and other things.

The project directory is somewhere in the *nix file tree. Maybe it's at /home/abandonhope/, or /var/www/vhosts/. For our purposes, it doesn't matter where the project directory is.

Here's a project directory:

|-- anon_ftp
|-- cgi-bin
|-- drupal_private >>>  PRIVATE FILES
...
|-- public_html  >>>  SITE ROOT (http://yoursite.com) MAIN DRUPAL TREE
|   |-- .htaccess >>> HIDDEN FILE (NAME STARTS WITH .)
...
|   |-- sites
|   |   |-- all
|   |   |   |-- libraries
...
|   |   |   |-- modules
...
|   |   |   |-- themes
...
|   |   |-- default
|   |   |   |-- settings.php >>>  DATABASE CONNECTION INFO &c.
|   |   |   |-- files >>>  PUBLIC FILES, USER UPLOADS
...
|   |-- themes
...

Some of the files and directories have special permissions. For example:

  • sites/default/settings.php must not be writable by the Web server.
  • sites/default/files/ should be writable, though most directories should not be.

So the site reset procedure has to:

  • Handle files in subdirectories, subsubdirectories, subsubsubdirectories...
  • Restore hidden and visible files.
  • Restore the permissions of files and directories.
  • Restore files in the main Drupal tree (under public_html above, that Drupal rooty thing), as well as files in the directory used by the private file system (drupal_private above).

Wise Bison says:

Buffalo Bernice
There are more files in a Drupal project, Horatio, than are in the Drupal root.

Database

A MySQL database is a set of files as well. However, you don't need to mess with them directly. Command line tools like mysqldump simplify database backup and restore.

Your user account

A user account has attributes like user name, password, groups, etc. The groups are particularly important, because of the way they interact with file permissions.

The Apache Web server runs under a user account. Not yours, but a system account, like nobody, www-data or apache. Let's assume it's apache.

Users belong to groups. So, apache belongs to one or more *nix groups. Groups are like roles in Drupal. Sysadmins put users in groups, so they can work with the same set of files.

Each file belongs to both a user and a group. When Drupal uploads a file to sites/default/files, it's actually the Web server that's doing the work, so the file is owned by the user apache. Which group does the OS set as the group owner of the file? apache's primary group. So if the user apache's primary group is psacln, then the uploaded file's group owner is psacln.

<Note>

psacln is a group created by Plesk, a Web control panel like cPanel.

</Note>

By default, when users upload files, Drupal sets their group permissions to read/write. From Drupal's includes/file.inc:

$mode = variable_get('file_chmod_file', 0664);
...
chmod($uri, $mode)

That 664 is a bit mask that turns on group read and group write.

When your restoration script runs, it runs under a user account, like lisabeth. lisabeth has a primary group. Here's the important thing:

If lisabeth and apache both have the same primary groups, they can read and write each others' files (in sites/default/files, at least).

So, your scripts will work best if your user and apache are in the same group. If you have root access, you can make sure of that, with a command like:

usermod -g psacln lisabeth

<Note>

There are other ways of setting up permissions and groups so that everything works, but this mammal doesn't know what they are. Can somebody explain? Remember, there's sweet sweet karma in it. Plus, you get to ride around on a centaur for a day.

Your centaur awaits.</Note>

Besides groups, the second user account attribute we care about is the home directory. When you log in as lisabeth, you are taken to lisabeth's home directory, in *nix this is typically /home/lisabeth. Commands you type are executed in the directory, unless you specify otherwise.

You can refer to your home directory with the tilde (~). So...

ls ~

... shows the files in your home directory, while ...

cd ~

... makes your home directory the current directory.

Let's assume that your home directory is the same as the project directory. That's a common arrangement. If there's a directory called public_html or www in your home directory, it's likely that your home directory and project directory are the same. (Unless you have multiple Drupal sites on the same account.)

Procedure overview

We'll need two scripts:

  • Create a snapshot of the files and database
    • Put the site offline.
    • Make a snapshot of the files in the main Drupal tree.
    • Make a snapshot of the files in the private file system.
    • Make a snapshot of the database.
    • Put the site online.
  • Restore from the snapshot
    • Put the site offline.
    • Erase the files in the main Drupal tree.
    • Restore the files in the main Drupal tree from a snapshot.
    • Erase the files in the private file system.
    • Restore the files in the private file system from a snapshot.
    • Restore the database from a snapshot.
    • Put the site online.

Then you can:

  • Set up cron to run the shell script.
  • Celebrate your achievement.
    • Get another Drupal tattoo.
    • Watch the musical episode from season 6 of Buffy.
    • Play with your dog.
    • Ride about in your Drupal chariot, waving to the users.

Your chariot

A directory for everything related to site restoration

Let's make a new directory for the site restoration function, and move the data files into it. The scripts will live there, as well.

The directory will be a child of the project directory. Recall that the project directory contains all of the files needed for your Drupal site, including the Drupal tree itself, and the private file system. Here is the project directory again:

|-- anon_ftp
|-- cgi-bin
|-- public_html  >>>  SITE ROOT (http://yoursite.com) MAIN DRUPAL TREE
|   |-- .htaccess >>> HIDDEN FILE (NAME STARTS WITH .)
...
|   |-- sites
|   |   |-- all
|   |   |   |-- libraries
...
|   |   |   |-- modules
...
|   |   |   |-- themes
...
|   |   |-- default
|   |   |   |-- settings.php >>>  DATABASE CONNECTION INFO &c.
|   |   |   |-- files >>>  PUBLIC FILES, USER UPLOADS
...
|   |-- themes
...
|-- drupal_private >>>  PRIVATE FILES
...

To set up the new directory:

mkdir restore_site

Now the tree should be:

|-- anon_ftp
|-- cgi-bin
|-- drupal_private >>>  PRIVATE FILES
...
|-- public_html  >>>  SITE ROOT (http://yoursite.com) MAIN DRUPAL TREE
...
|-- restore_site

The new directory is a sibling of the site root.

Commands and stuff

Let's go through *nix commands you'll need.

File snapshots

To make a file snaphot:

tar -cvpzf drupal_tree_snapshot.tgz public_html

  • tar - combines a bunch o' files into one archive file.
  • c - create an archive file
  • v - verbose output, so you can watch what happens
  • p - preserve the permissions of the files
  • z - compress the files
  • f - name of the archive file is in the command
  • drupal_tree_snapshot.tgz - the name of the archive file
  • public_html - the files to grab (includes subdirectories)

(Credit: adapted from a post by Gareth Alexander.)

<Note>

Often, the directory isn't public_html. Mayhap you have a subdomain called demo, mapped to the directory demo, and put a demo site there. Use whatever directory is appropriate for your project.

</Note>

Run this command in the directory containing public_html, or whatever your Drupal site root is.

You can check the results. To see one screen at a time:

tar -tvf drupal_tree_snapshot.tgz | more

Notice that "public_html" is included in the path of each file. We need to know that, to extract everything correctly.

Press the space bar to go to the next page. Press q when you get bored.

If you want to check, say, whether the hidden file .htaccess is in the archive, try this:

tar -tvf drupal_tree_snapshot.tgz | grep .htaccess

The | (pipe) character separates *nix commands. It sends the output of the first command into the second. grep is an oft used text pattern matching utility. It will show just the lines matching the pattern.

<Note>

BTW, this command...

tree -ap

... will show you a directory tree. The a switch means "all" (like hidden files), and p means "show permissions."

This...

tree -ap > tree.txt

... will put the output into the file tree.txt. Print it out, and casually show it to your boss, to prove how geeky you are.

The tree utility isn't installed by default on all *nixen. For CentOS and similar, enter...

yum install tree

... to install it.

</Note>

To grab a snapshot of your private files, if you have any:

tar -cvpzf drupal_private_snapshot.tgz drupal_private

At the end of this step, you have two shiny new files:

  • drupal_tree_snapshot.tgz
  • drupal_private_snapshot.tgz

To move the files into the right directory:

mv drupal_tree_snapshot.tgz restore_site/
mv drupal_private_snapshot.tgz restore_site/

Then:

cd restore_site
ls

You should see your three files.

Database snapshots

Gareth Alexander to the rescue again! This command is adapted from his post at http://www.garethalexander.co.uk/tech/mysql-backup-and-restore:

mysqldump -u <db_username> --password=<db_password> <database_name> > drupal_db_snapshot.sql

For example, if the database was called mydbofdoom, and the username was supergabe, with a password of crushbadguys:

mysqldump -u supergabe --password=crushbadguys mydbofdoom > drupal_db_snapshot.sql

"mysqldump" means what you think: dump the database. The commands outputs the MySQL commands needed to recreate the database, exactly what we want. > drupal_db_snapshot.sql means to store the dump into the file drupal_db_snapshot.sql. If you forget this bit, the SQL commands will flash by on your screen. Flashy flash, hello, goodbye.

If you peek inside drupal_db_snapshot.sql, you'll see commands like this:

DROP TABLE IF EXISTS `actions`;

This is why we don't have to erase the old data in the database before running drupal_db_snapshot.sql. The DROP TABLE commands in drupal_db_snapshot.sql will erase the existing data for us. W00tful!

Now you have the third data file you need to regenerate your site: the database. Move it into the right diectory:

mv drupal_db_snapshot.sql restore_site/

Onward!

A script for making the snapshot

You'll restore your site from three files:

  • Archive file of the Drupal tree
  • Archive file of the private files directory (if you have one)
  • Database dump

It's a good idea to automate the creation of those files. If you want to change your snapshot, then you simply rerun the creation script.

Put the commands for making the files into a file called, for example, take_snapshots.sh:

#Take file and database snapshots for the site restoration script.
#Make the project directory current.
cd ~
#Site to maintenance mode.
~/drush/drush -r ~/demo vset maintenance_mode 1
#Erase the existing snapshot files.
rm -rf restore_site/drupal_tree_snapshot.tgz
rm -rf restore_site/drupal_private_snapshot.tgz
rm -rf restore_site/drupal_db_snapshot.sql
#Take the new snapshots.
tar -cpzf restore_site/drupal_tree_snapshot.tgz public_html/
tar -cpzf restore_site/drupal_private_snapshot.tgz drupal_private/
mysqldump -u supergabe --password=crushbadguys mydbofdoom >restore_site/drupal_db_snapshot.sql
#Site out of maintenance mode.
~/drush/drush -r ~/demo vset maintenance_mode 0

The file won't be executable until you set its execute permission:

chmod u+x take_snapshots.sh

To check that this command worked, look at the file's permission flags:

ls -al take_snapshots.sh

Result:

-rwxr--r-- 1 stuff stuff stuff take_snapshots.sh

The x in the fourth position tells you that the file owner can execute the file. Yes! Your chipmunk is pleased.

Your chipmunk<Note>

Before you run the snapshot script, make sure the database is the way you want it. Put the site online, set the caching, log out of all sessions.

</Note>

Run the snapshot script. E.g.:

./take_snapshots.sh

Look at your directory, showing the file sizes:

ls -al

Here's what I see:

stuff  5558538  stuff  drupal_db_snapshot.sql
stuff  248420 stuff  drupal_private_snapshot.tgz
stuff  43599491  stuff  drupal_tree_snapshot.tgz

The database snapshot is about 5.5M, which seems about right for my project. The private files are about 1/4M. Not much. Again, about right. The Drupal tree is about 43.5M. Lots o' stuff. About right.

So, we have a script that will take a snappy snap snapshot of our site.

The restore shell script

Now to write the *nix script that cron will call every so often, to restore the snapshot.

Here's what the script should do:

  1. Put the site offline.
  2. Erase the files in the main Drupal tree.
  3. Restore the files in the main Drupal tree from a snapshot.
  4. Erase the files in the private file system.
  5. Restore the files in the private file system from a snapshot.
  6. Restore the database from a snapshot.
  7. Put the site online.

You may be asking yourself: "Self, why erase the existing files? Why not just write over them?" Because users may have uploaded new files while messing with your site. You need to erase everything in sites/default/files and in the private files directory, to make sure you kill those uploaded files.

(I found this tip in a blog post by Phil Taylor, at http://www.phil-taylor.com/2008/08/18/creating-a-joomla-demo-site-the-ri....)

Let's see how we might write the script.

1. Put the site offline

Drush will put the site into maintenance mode for us. The command is:

drush -r site-path vset maintenance_mode 1

For example:

drush -r ~/public_html vset maintenance_mode 1

  • r - the root of the Drupal site follows.
  • vset - set a variable.
  • maintenance_mode - the variable to set.
  • 1 - variable value. 1 means "on" for maintenance_mode .

You can test this command. Run it, and then check to see if your site is offline.

2. Erase the files in the main Drupal tree.

This command will erase all the files in the directory killthesefiles:

rm -rf killthesefiles/*
rm -rf killthesefiles/.*

The r switch means recursive, erasing all subdirectories, and their files and subdirectories. f means force. "Don't argue, OS, just do it."

Why two commands? The first one removes the regular files. The second the hidden files.

<Note>

Sometimes I get an error message from the second command, but it still works. This *nix wizard knows a better way:

Dog

Unfortunately, being a dog, the Wizard can't speak or, more importantly, type. Can you? Can you give a better way?

</Note>

The commands will erase the contents of killthesefiles, but will not erase the directory killthesefiles itself. That's what we want.

Here's how the commands will look in our script:

rm -rf public_html/*
rm -rf public_html/.*

<WARNING! Danger, Will Robinson!>

This command will erase every public Web file. If you serve more than one Web site from your account, you will need to adjust the command. For example, suppose you have two sites:

  • terrorseals.net under ~/public_html/terrorseals/
  • introspectiveunicorns.com under ~/public_html/introspectiveunicorns/

The command...

rm -rf public_html/

... will destroy both sites.

Suppose you only want to reset introspectiveunicorns.com. Change the command to:

rm -rf public_html/introspectiveunicorns/

</WARNING! Danger, Will Robinson!>

3. Restore the files in the main Drupal tree from a snapshot.

Recall that the files are in the archive file ~/restore_site/drupal_tree_snapshot.tgz. Here's the command:

tar -zpxf restore_site/drupal_tree_snapshot.tgz

  • z: compressed
  • p: restore permissions
  • x: extract
  • f: use the file given

Recall that the archive added "public_html" to the path of every file. So that's where the files will be restored. Just what we want!

4. Erase the files in the private file system.

rm -rf drupal_private/*
rm -rf drupal_private/.*

5. Restore the files in the private file system from a snapshot.

tar -zpxf restore_site/drupal_private_snapshot.tgz

5.5. Testing

Test what you know so far.

Make a directory in your project directory, called demotest. Pretend that this is your project directory. Inside it, create three directories:

  • public_html - to simulate your Drupal tree.
  • drupal_private - to simulate your private file directory.
  • restore_site - to simulate your script directory.

You can skip drupal_private is you don't have any private files.

Add some files and directories inside the fake public_html. Here's the file tree I made:

public_html/
|-- [-rw-r--r--]  .ghost
|-- [drwxr-xr-x]  d1
|   |-- [drwxr-xr-x]  d1.1
|   |   |-- [-rwxr-xr-x]  anchor.png
|   |   `-- [-rw-r--r--]  f1.txt
|   `-- [drwxr--r--]  d1.2
|       |-- [-rwx------]  arrow_tiny_right.png
|       `-- [-rw-r--r--]  f2.txt
`-- [drwxr-xr--]  d2
    `-- [-rw-r--r--]  settings.php

It has a hidden file: .ghost. There are files and directories with different permissions, like settings.php. Drupal uses hidden files and files with various permissions, so having some in the fake tree with make for a better test.

Set the fake project directory as your current directory. E.g.:

cd /~demotest

Now, make an archive of public_html using a command we used above:

tar -cvpzf drupal_tree_snapshot.tgz public_html

You should now have a file called drupal_tree_snapshot.tgz. Move it into your script directory:

mv drupal_tree_snapshot.tgz restore_site/

Frosty! If you use the private file system, add some fake files to drupal_private, and tar that as well:

tar -cvpzf drupal_private_snapshot.tgz drupal_private

Move it to the script directory:

mv drupal_private_snapshot.tgz restore_site/

Here's the file tree:

demotest/
|-- [drwxr-xr-x]  drupal_private
|   `-- [-rw-r--r--]  an_angel.txt
|-- [drwxr-xr-x]  public_html
|   |-- [-rw-r--r--]  .ghost
|   |-- [drwxr-xr-x]  d1
|   |   |-- [drwxr-xr-x]  d1.1
|   |   |   |-- [-rwxr-xr-x]  anchor.png
|   |   |   `-- [-rw-r--r--]  f1.txt
|   |   `-- [drwxr--r--]  d1.2
|   |       |-- [-rwx------]  arrow_tiny_right.png
|   |       `-- [-rw-r--r--]  f2.txt
|   `-- [drwxr-xr--]  d2
|       `-- [-rw-r--r--]  settings.php
`-- [drwxr-xr-x]  restore_site
    |-- [-rw-r--r--]  drupal_private_snapshot.tgz
    `-- [-rw-r--r--]  drupal_tree_snapshot.tgz

Now - scripting time! Waahoo!

Try this, with ~/demotest/restore_site as the current directory:

nano restore_site.sh

<Note>

If "nano" doesn't work, try typing "pico". Yes, really.

</Note>

Type (or copy-and-paste) this:

;Testing site restoration.
;Make the project directory current.
cd ~/demotest/
;Erase current Drupal tree files.
rm -rf public_html/*
rm -rf public_html/.*
;Restore original Drupal tree files.
tar -zpxf restore_site/drupal_tree_snapshot.tgz
;Erase current private files.
rm -rf drupal_private/*
rm -rf drupal_private/.*
;Restore the original private files.
tar -zpxf restore_site/drupal_private_snapshot.tgz

Save (ctrl+O in nano) and exit (ctrl+X).

To make the file executable, you need to give yourself permission to run it:

chmod u+x restore_site.sh

"u+x" means "give the user who owns the file execute permission."

To check:

ls -al restore_site.sh

You should see something like:

-rwxr--r-- 1 user group date restore_site.sh

The x in the fourth character position means "the user who owns the file can execute it."

<Note>

You should know about *nix file permissions. Check Dave Eisenberg's chmod Tutorial.

</Note>

Now, gird your loins, and prepare for action! Type...

./restore_site.sh

You might get this error...

rm: cannot remove `.' or `..'

... which you can ignore.

Check the contents of public_html and drupal_private. Got the right stuff in them? Good!

Now, let's really put it to the test. Randomly erase and add files from public_html and drupal_private. Run your script again. You should have your files back. Yahoo!

6. Restore the database from a snapshot.

The command is:

mysql -u <db_username> -p <db_password> <database_name> < dumpfilename.sql

For example:

mysql -u supergabe --password=crushbadguys mydbofdoom < ~/restore_site/drupal_db_snapshot.sql

7. Put the site online.

The Drush command is:

drush -r ~/public_html vset maintenance_mode 0

Same as before, but with 0 instead of 1, to turn maintenance_mode off.

The final script

Here it is:

#Site restoration.
#Put site offline.
drush -r ~/public_html vset maintenance_mode 1
#Make the project directory current.
cd ~
#Erase current Drupal tree files.
rm -rf public_html/*
rm -rf public_html/.*
#Restore original Drupal tree files.
tar -zpxf restore_site/drupal_tree_snapshot.tgz
#Erase current private files.
rm -rf drupal_private/*
rm -rf drupal_private/.*
#Restore the original private files.
tar -zpxf restore_site/drupal_private_snapshot.tgz
#Restore the database.
mysql -u supergabe --password=crushbadguys mydbofdoom <
~/restore_site/drupal_db_snapshot.sql
#Put site online.
drush -r ~/public_html vset maintenance_mode 0

Backup your site

Drush to the rescue! Again.

drush ard -v -r ~/public_html

  • ard: tell drush to create an archive dump
  • v: verbose. Let's you watch.
  • r: tells Drush which Drupal installation to backup.

Drush creates an archive file with the files and the database in it. Drush will tell you where it put the archive file. You can check it, to make sure it contains what you expect. For example, the command...

tar -tvzf ard_file_name.tar.gz | grep sql

... will show you the names of all of the files in the archive that have "sql" in their names. (Of course, replace "ard_file_name" with the name of the file that Drush created on your system.) Because of the v (verbose) option, your should also see the size of each file.

One of the files should be an export of your database, with the extension .sql. It will probably be larger than most of the other files.

Try it

OK - are you ready? Do or die time. Let's run the final script.

<Note>

Being a Nervous Nellie, I made extra backups with my site's control panel, before proceeding. Yes, I know, I'm not bold. You know the saying:

There are old Webbers,
And there are bold Webbers,
But there are no old, bold Webbers.

(Adapted from Some Mothers Do 'Ave 'Em. The learning-to-fly episode.)

Not fancy</Note>

Make a few changes to your site, so you'll know whether the script works. Add some files, change some content, whate're takes your fancy.

Then:

~/restore_site/restore_site.sh

Check it. Check it good.

Cue Devo.

If a site has been restored, you must check it.
When you're making restore scripts, you must check it.
So check it.
Check it good.

Cron it

Cron runs programs at times you specify. The tutorial at http://www.cyberciti.biz/faq/how-do-i-add-jobs-to-cron-under-linux-or-un... explains the syntax, and has some cute penguins to boot.

This mammal uses his/her/its Web control panel to set up cron jobs. It's easier.

<Note>

If you want to do it at the command line, consider using the nano editor, rather than the default vi. To change the editor:

EDITOR=nano crontab -e
export EDITOR=nano

From http://osxdaily.com/2011/03/07/change-set-the-default-crontab-editor/.

</Note>

Here's how it looks in Plesk.

From a site control panel, choose Scheduled Tasks under Advanced:

Scheduling

Now select the user account used to run the process. This will normally be the same one you used to do the work above.

Choose a user

Add a task:

Schedule a task

Now you need tell cron two things:

  • The command to execute.
  • When to execute it.

The command is the one that runs the w00ty script you wrote:

~/restore_site/restore_site.sh

What about the when? Usually, you want to restore the site every couple of hours. Type */2 in the hours field, to run the script every two hours.

Cron settings

How do you know this works? You can get an email sent to you when the task runs. It's worth doing this for day or so, so you can make sure that it all works. Then you can turn the email off. Unless you like getting lots of emails.

You geek, you!

Get all this working, and treat yourself to a new mechanical pencil.

It helps users to show a timer, so they know when the next reset happens. That will be in a future post.

AttachmentSize
brain.gif30.91 KB
buffalo_head.png2.99 KB
centaur.png5.43 KB
chariot.png14.36 KB
chipmunk_6.png10.03 KB
dog_point.png9.93 KB
airplane5.png33.85 KB
select_user.png16.84 KB
schedule_new_task.png24.56 KB
scheduling.png30.06 KB
time-fields.png41.29 KB

Looking for support? Visit the Drupal.org forums, or join #drupal-support in IRC.

Comments

Just in case, the Demo module (https://drupal.org/project/demo) has already implemented this feature.

Drupion.com — Drupal-centric, Drupal-specific, Drupal-optimized hosting company.

Aye. But, at the moment, bugs mean the module can't be used.

Kieran