Community Documentation

Using a load balancer or reverse proxy

Last updated August 30, 2011. Created by ghoti on April 7, 2009.
Edited by christefano, emmajane. Log in to edit this page.

When running large Drupal installations, you may find yourself with a web server cluster that lives behind a load balancer. The pages here contain tips for configuring Drupal in this setup, as well as example configurations for various load balancers.

In addition to a large selection of commercial options, various open source load balancers exist: Pound, Varnish, ffproxy, tinyproxy, etc. Web servers (including Apache and NGINX) can also be configured as reverse proxies.

The basic layout you can expect in most high-availability environments will look something like this:

 
Browser
 
──→
 
HTTP Reverse Proxy
    ┌─→
──┼─→
    └─→
Web server 1
Web server 2
Web server 3

→ Database

By way of explanation:

  • Browsers will connect to a reverse proxy using HTTP or HTTPS. The proxy will in turn connect to web servers via HTTP.
  • Web servers will likely be on private IP addresses. Use of a private network allows web servers to share a database and/or NFS server that need not be exposed to the Internet on a public IP address.
  • If HTTPS is required, it is configured on the proxy, not the web server.

Most HTTP reverse proxies will also "clean" requests in some way. For example, they'll require that a browser include a valid User-Agent string, or that the requested URL contain standard characters or not exceed a certain length.

In the case of Drupal, it is highly recommended that all web servers share identical copies of the Drupal DocumentRoot in use, to insure version consistency between themes and modules. This may be achieved using an NFS mount to hold your Drupal files, or by using a revision control system (CVS, SVN, git, etc) to maintain your files.

High availability

In order to achieve the maximum uptime, a high-availability design should have no single points of failure. For network connectivity, this may mean using BGP with multiple upstream providers, as well as perhaps using Link Aggregation (LACP) to maintain multiple physical network paths in your LAN. In the diagram above, the two server elements that need attention are the load balancer and the database.

A load balancer cannot easily be "clustered" because a single IP address usually needs to apply to a single machine. To address this issue, you may wish to read up on CARP (FreeBSD) and Heartbeat (Linux).

A database server generally needs access to a single repository of data. Various technologies exist to address this, including MySQL NDB and PgCluster. If you're willing to accept the possibility of less than 100% up-time while you recover from broken hardware, you should consider using transactional database replication to keep a live copy of your data on a secondary server. Read the documentation for your database server software to find out how to set this up.

Needless to say, always set up regular automated backups.

Note:

  • If you plan to install Drupal 7 on a web server that browsers will reach only via HTTPS, there's an outstanding issue you'll want to check (#313145: Support X-Forwarded-Proto HTTP header). At this time, Drupal's AJAX callbacks use URLs based on the protocol used at the web server, regardless of the protocol used at the proxy. Your workaround is either this patch, or to set the "reverse_proxy" variable manually in your settings.php file. Unfortunately, as the Drupal installer relies on AJAX, your only other option is to install via HTTP instead of HTTPS.

Comments

File Uploads

I have been searching for months (for version 6) but have not found a solution to this.

How does version 7 tackle this?

For example, a file could be uploaded to web server 1 but later it could be requested from web server 2 where it won't exist.

Use a file share

To solve the file upload to a single server issue, you could perhaps use a file share (NFS, NAS, ...) as the file destination and set the Adminstrater -> File System -> File System Path to point to this location. To eliminate that as a single point of failure, I am sure there are many solutions available as well.

Use lsync...

We use lysnc to keep the files and code base synced across multiple servers. All files in the drupal root are pushed from a master server to the servers whenever there is a change.

Still looking for solution on file uploads

Hello,

We have a static and very light site for a big corporation with 2 servers load-balanced, I find it ugly to buy a third server only for a few images, isn't it a way to make the upload function copy the files to both servers ? any module maybe ?

thanks anyway :)

use NFS

vincent,

Your issue is the same as the one above. Just as your database provides a consistent back-end for dynamically generated content, you need to use something like NFS to provide a consistent storage area for files.

It *might* be possible to set up your reverse proxy to save and fetch files only from one server and not the other, but that would reduce the usefulness of having a web server cluster. You get (msot of) the load balancing, but not the redundancy in case of failure.

If you don't want to use a separate storage server, and you're okay with the loss of redundancy, you could optionally export your file storage from one of your servers to the other.

And one other hack might be to set up triggers after file uploads that would cause a process on the server to synchronize files using rsync, but there would be lots of options for failure. Perhaps just forcing uploads to go to just one server, then have a cron job running every 1 minute to synchronize files....

I still like having a separate server. :-)

S3 is better

We use the amazons3 module. It works great.

Settings.php Reverse Proxy Configuration

Can you please go over the Reverse Proxy Configuration in the settings.php?

I know there's documentation there, but some things are hard for a newbie to understand.

Please let me know if this is right:

Uncomment: # $conf['reverse_proxy'] = TRUE;

It seems I need to specify all the reverse proxies IP addresses in $conf['reverse_proxy_addresses']
Question: how do I find the IP address of all reverse proxies?
Question: According to the latest documentation: "If a complete list of reverse proxies is not available in your environment (for example, if you use a CDN) you may set the $_SERVER['REMOTE_ADDR'] variable directly in settings.php. Be aware, however, that it is likely that this would allow IP address spoofing unless more advanced precautions are taken." I'm using a CDN, so should I use just $_SERVER['REMOTE_ADDR'] in place of $conf['reverse_proxy_addresses'] completely? Also, what are the advanced precautions if I use $_SERVER['REMOTE_ADDR']?

Then I need to enable: $conf['reverse_proxy_header'] = 'HTTP_X_CLUSTER_CLIENT_IP';
Question: "Set this value if your proxy server sends the client IP in a header other than X-Forwarded-For." How do I find out if my proxy server sends the client IP in a header other than X-Forwarded-For?

Thanks

Page status

About this page

Audience
Site administrators
Drupal’s online documentation is © 2000-2013 by the individual contributors and can be used in accordance with the Creative Commons License, Attribution-ShareAlike 2.0. PHP code is distributed under the GNU General Public License. Comments on documentation pages are used to improve content and then deleted.
nobody click here