When running large Drupal installations, you may find yourself with a web server cluster that lives behind a load balancer. The pages here contain tips for configuring Drupal in this setup, as well as example configurations for various load balancers.
In addition to a large selection of commercial options, various open source load balancers exist: Pound, Varnish, ffproxy, tinyproxy, etc. Web servers (including Apache and NGINX) can also be configured as reverse proxies.
The basic layout you can expect in most high-availability environments will look something like this:
HTTP Reverse Proxy
Web server 1
Web server 2
Web server 3
By way of explanation:
- Browsers will connect to a reverse proxy using HTTP or HTTPS. The proxy will in turn connect to web servers via HTTP.
- Web servers will likely be on private IP addresses. Use of a private network allows web servers to share a database and/or NFS server that need not be exposed to the Internet on a public IP address.
- If HTTPS is required, it is configured on the proxy, not the web server.
Most HTTP reverse proxies will also "clean" requests in some way. For example, they'll require that a browser include a valid User-Agent string, or that the requested URL contain standard characters or not exceed a certain length.
In the case of Drupal, it is highly recommended that all web servers share identical copies of the Drupal DocumentRoot in use, to insure version consistency between themes and modules. This may be achieved using an NFS mount to hold your Drupal files, or by using a revision control system (CVS, SVN, git, etc) to maintain your files.
In order to achieve the maximum uptime, a high-availability design should have no single points of failure. For network connectivity, this may mean using BGP with multiple upstream providers, as well as perhaps using Link Aggregation (LACP) to maintain multiple physical network paths in your LAN. In the diagram above, the two server elements that need attention are the load balancer and the database.
A load balancer cannot easily be "clustered" because a single IP address usually needs to apply to a single machine. To address this issue, you may wish to read up on CARP (FreeBSD) and Heartbeat (Linux).
A database server generally needs access to a single repository of data. Various technologies exist to address this, including MySQL NDB and PgCluster. If you're willing to accept the possibility of less than 100% up-time while you recover from broken hardware, you should consider using transactional database replication to keep a live copy of your data on a secondary server. Read the documentation for your database server software to find out how to set this up.
Needless to say, always set up regular automated backups.
- If you plan to install Drupal 7 on a web server that browsers will reach only via HTTPS, there's an outstanding issue you'll want to check (#313145: Support X-Forwarded-Proto HTTP header). At this time, Drupal's AJAX callbacks use URLs based on the protocol used at the web server, regardless of the protocol used at the proxy. Your workaround is either this patch, or to set the "reverse_proxy" variable manually in your settings.php file. Unfortunately, as the Drupal installer relies on AJAX, your only other option is to install via HTTP instead of HTTPS.