Last updated August 30, 2011. Created by ghoti on April 7, 2009.
Edited by christefano, emmajane. Log in to edit this page.
When running large Drupal installations, you may find yourself with a web server cluster that lives behind a load balancer. The pages here contain tips for configuring Drupal in this setup, as well as example configurations for various load balancers.
In addition to a large selection of commercial options, various open source load balancers exist: Pound, Varnish, ffproxy, tinyproxy, etc. Web servers (including Apache and NGINX) can also be configured as reverse proxies.
The basic layout you can expect in most high-availability environments will look something like this:
-
Browser
──→
HTTP Reverse Proxy┌─→
──┼─→
└─→Web server 1
Web server 2
Web server 3↘
→ Database
↗
By way of explanation:
- Browsers will connect to a reverse proxy using HTTP or HTTPS. The proxy will in turn connect to web servers via HTTP.
- Web servers will likely be on private IP addresses. Use of a private network allows web servers to share a database and/or NFS server that need not be exposed to the Internet on a public IP address.
- If HTTPS is required, it is configured on the proxy, not the web server.
Most HTTP reverse proxies will also "clean" requests in some way. For example, they'll require that a browser include a valid User-Agent string, or that the requested URL contain standard characters or not exceed a certain length.
In the case of Drupal, it is highly recommended that all web servers share identical copies of the Drupal DocumentRoot in use, to insure version consistency between themes and modules. This may be achieved using an NFS mount to hold your Drupal files, or by using a revision control system (CVS, SVN, git, etc) to maintain your files.
High availability
In order to achieve the maximum uptime, a high-availability design should have no single points of failure. For network connectivity, this may mean using BGP with multiple upstream providers, as well as perhaps using Link Aggregation (LACP) to maintain multiple physical network paths in your LAN. In the diagram above, the two server elements that need attention are the load balancer and the database.
A load balancer cannot easily be "clustered" because a single IP address usually needs to apply to a single machine. To address this issue, you may wish to read up on CARP (FreeBSD) and Heartbeat (Linux).
A database server generally needs access to a single repository of data. Various technologies exist to address this, including MySQL NDB and PgCluster. If you're willing to accept the possibility of less than 100% up-time while you recover from broken hardware, you should consider using transactional database replication to keep a live copy of your data on a secondary server. Read the documentation for your database server software to find out how to set this up.
Needless to say, always set up regular automated backups.
Note:
- If you plan to install Drupal 7 on a web server that browsers will reach only via HTTPS, there's an outstanding issue you'll want to check (#313145: Support X-Forwarded-Proto HTTP header). At this time, Drupal's AJAX callbacks use URLs based on the protocol used at the web server, regardless of the protocol used at the proxy. Your workaround is either this patch, or to set the "reverse_proxy" variable manually in your settings.php file. Unfortunately, as the Drupal installer relies on AJAX, your only other option is to install via HTTP instead of HTTPS.
Comments
File Uploads
I have been searching for months (for version 6) but have not found a solution to this.
How does version 7 tackle this?
For example, a file could be uploaded to web server 1 but later it could be requested from web server 2 where it won't exist.
Use a file share
To solve the file upload to a single server issue, you could perhaps use a file share (NFS, NAS, ...) as the file destination and set the Adminstrater -> File System -> File System Path to point to this location. To eliminate that as a single point of failure, I am sure there are many solutions available as well.
Still looking for solution on file uploads
Hello,
We have a static and very light site for a big corporation with 2 servers load-balanced, I find it ugly to buy a third server only for a few images, isn't it a way to make the upload function copy the files to both servers ? any module maybe ?
thanks anyway :)
use NFS
vincent,
Your issue is the same as the one above. Just as your database provides a consistent back-end for dynamically generated content, you need to use something like NFS to provide a consistent storage area for files.
It *might* be possible to set up your reverse proxy to save and fetch files only from one server and not the other, but that would reduce the usefulness of having a web server cluster. You get (msot of) the load balancing, but not the redundancy in case of failure.
If you don't want to use a separate storage server, and you're okay with the loss of redundancy, you could optionally export your file storage from one of your servers to the other.
And one other hack might be to set up triggers after file uploads that would cause a process on the server to synchronize files using rsync, but there would be lots of options for failure. Perhaps just forcing uploads to go to just one server, then have a cron job running every 1 minute to synchronize files....
I still like having a separate server. :-)
--
Paul Chvostek - it.canada - climbers.org