I think the module is brilliantly simple in concept, and fantastically improved the performance, and reduced load on my sites, but in debugging a problem I had with the D6 patch for Boost (http://drupal.org/node/276060) I found that the standard rewrite rules are processed for all requests css,image,js,swf and not just the php script that produces the html. The way the rules are laid out mean that virtually every RewriteCond matches for all files (esp non-logged in users), when they should be just served straight away without the rewrite rule tests, this causes extra unneccessary load on Apache.

I understand that this is arguably not a bug, as the Boost code works fine, it is more a configuration issue. However this is a performance related module after all, and with a few rules changes you can lose hundreds regex matches and file/directory accesses per page request, which will improve performance even more :-D

Anyway, with the a standard Drupal install with garland theme, with aggregation on, my home page takes 22 requests to download, with the css files, js files, a few pictures, and associated theme gifs.
21 requests for css,jpg,js files have to go through the rules, which include file existance and directory tests, when they should just be served straight from files.

I basically chucked in an initial test for the the request being formed like a normal file, ie it ends in . followed by 3 or 4 characters like .php or .html, this might not work on some sites, so you can use one that is more specific about having file extensions in it, which also capture pages with query strings in it so you can lose the query_string check. Using the skip flag you can lose the doubled up checks also.

Totalling the improvements in reqs/sec for all js,css,image files on top of slight improvement for the response of html files, I saw an increase of hundreds of reqs/sec.

#       Should work for most sites
        RewriteCond %{REQUEST_FILENAME} \.\w{3,4}$
#        Safer but a bit slower
#       RewriteCond %{REQUEST_FILENAME} \.(php|ico|png|jpg|mov|gif|css|js|swf|html|html\?)(\W.*)?
	RewriteRule .? - [L]
        RewriteCond %{HTTP_COOKIE} DRUPAL_UID
	RewriteRule .? - [S=6]
        RewriteCond %{QUERY_STRING} !^$
	RewriteRule .? - [S=5]
        RewriteCond %{REQUEST_METHOD} !^GET$
	RewriteRule .? - [S=4]
        RewriteCond %{REQUEST_URI} ^/$
        RewriteCond %{DOCUMENT_ROOT}/cache/%{SERVER_NAME}/0/index.html -f
        RewriteRule ^(.*)$ cache/%{SERVER_NAME}/0/index.html [L]
        RewriteCond %{REQUEST_URI} ^/(cache|user|admin)
	RewriteRule .? - [S=2]
        RewriteCond %{DOCUMENT_ROOT}/cache/%{SERVER_NAME}/0%{REQUEST_URI} -d
        RewriteCond %{DOCUMENT_ROOT}/cache/%{SERVER_NAME}/0%{REQUEST_URI}/index.html -f
        RewriteRule ^(.*)$ cache/%{SERVER_NAME}/0/$1/index.html [L]
        RewriteCond %{DOCUMENT_ROOT}/cache/%{SERVER_NAME}/0%{REQUEST_URI}.html -f
        RewriteRule ^(.*)$ cache/%{SERVER_NAME}/0/$1.html [L]
        RewriteCond %{REQUEST_FILENAME} !-f
        RewriteCond %{REQUEST_FILENAME} !-d
        RewriteRule ^(.*)$ index.php?q=$1 [L,QSA]

Comments

wobbler’s picture

Simulating a non-logged in user visiting a default Drupal 5.7 site without aggregation or caching, I ran an ab test on all the components of the home page with the standard rewrite rules, and the updated ones. I did one run on each request, taking the value, without any re-runs. This is a test site on production PIII server, running email, apache etc so the load may not have been consistant across all runs, but it demonstrates the improvements. The first value is with the updated rules, the second is with the standard rules as per the boosted.txt. For some reason the updates don't make much difference to the js files, but mainly the img files and the css.

ab -n 1000 -c 5 http://www.example.com/
Requests per second:    646.77 [#/sec] (mean)
Requests per second:    629.15 [#/sec] (mean)

ab -n 1000 -c 5 http://www.example.com/modules/node/node.css
Requests per second:    801.01 [#/sec] (mean)
Requests per second:    721.17 [#/sec] (mean)

ab -n 1000 -c 5 http://www.example.com/modules/system/defaults.css
Requests per second:    797.33 [#/sec] (mean)
Requests per second:    704.68 [#/sec] (mean)

ab -n 1000 -c 5 http://www.example.com/modules/system/system.css
Requests per second:    787.56 [#/sec] (mean)
Requests per second:    675.70 [#/sec] (mean)

ab -n 1000 -c 5 http://www.example.com/modules/user/user.css
Requests per second:    795.63 [#/sec] (mean)
Requests per second:    721.05 [#/sec] (mean)

ab -n 1000 -c 5 http://www.example.com/sites/all/modules/thickbox/thickbox.css
Requests per second:    757.79 [#/sec] (mean)
Requests per second:    703.43 [#/sec] (mean)

ab -n 1000 -c 5 http://www.example.com/themes/garland/style.css
Requests per second:    747.01 [#/sec] (mean)
Requests per second:    675.12 [#/sec] (mean)

ab -n 1000 -c 5 http://www.example.com/misc/jquery.js
Requests per second:    685.26 [#/sec] (mean)
Requests per second:    679.73 [#/sec] (mean)

ab -n 1000 -c 5 http://www.example.com/misc/drupal.js
Requests per second:    721.32 [#/sec] (mean)
Requests per second:    715.38 [#/sec] (mean)

ab -n 1000 -c 5 http://www.example.com/sites/all/modules/thickbox/thickbox.js
Requests per second:    694.84 [#/sec] (mean)
Requests per second:    692.72 [#/sec] (mean)

ab -n 1000 -c 5 http://www.example.com/themes/garland/print.css
Requests per second:    794.37 [#/sec] (mean)
Requests per second:    715.88 [#/sec] (mean)

ab -n 1000 -c 5 http://www.example.com/themes/garland/logo.png
Requests per second:    753.72 [#/sec] (mean)
Requests per second:    721.09 [#/sec] (mean)

ab -n 1000 -c 5 http://www.example.com/misc/feed.png
Requests per second:    816.77 [#/sec] (mean)
Requests per second:    734.65 [#/sec] (mean)

ab -n 1000 -c 5 http://www.example.com/themes/garland/images/bg-navigation.png
Requests per second:    767.45 [#/sec] (mean)
Requests per second:    695.64 [#/sec] (mean)

ab -n 1000 -c 5 http://www.example.com/themes/garland/images/body.png
Requests per second:    809.31 [#/sec] (mean)
Requests per second:    726.30 [#/sec] (mean)

ab -n 1000 -c 5 http://www.example.com/themes/garland/images/bg-content.png
Requests per second:    803.39 [#/sec] (mean)
Requests per second:    721.43 [#/sec] (mean)

ab -n 1000 -c 5 http://www.example.com/themes/garland/images/bg-content-right.png
Requests per second:    796.39 [#/sec] (mean)
Requests per second:    713.65 [#/sec] (mean)

ab -n 1000 -c 5 http://www.example.com/themes/garland/images/bg-content-left.png
Requests per second:    785.43 [#/sec] (mean)
Requests per second:    714.15 [#/sec] (mean)

ab -n 1000 -c 5 http://www.example.com/themes/garland/images/menu-leaf.gif
Requests per second:    739.37 [#/sec] (mean)
Requests per second:    697.24 [#/sec] (mean)

ab -n 1000 -c 5 http://www.example.com/themes/garland/images/menu-expanded.gif
Requests per second:    773.11 [#/sec] (mean)
Requests per second:    695.27 [#/sec] (mean)

ab -n 1000 -c 5 http://www.example.com/themes/garland/images/menu-collapsed.gif
Requests per second:    768.41 [#/sec] (mean)
Requests per second:    693.18 [#/sec] (mean)
wobbler’s picture

Duh! The reason it doesn't work for the js files is because they are only two characters and the rule I wrote was for 3 or 4.
Updated rule to fix.
RewriteCond %{REQUEST_FILENAME} \.\w{2,4}$

alanburke’s picture

Interesting stuff.
[Note, I'm not actually running Boost module myself yet, just checking out my options]

You seem to know the ins and outs of .htaccess very well
Are there improvements that can me made to the standard drupal .htaccess file to improve performance, even without Boost module?

http://cvs.drupal.org/viewvc.py/drupal/drupal/.htaccess?revision=1.93&vi...

is the latest. There was a big debate about missing favicon.ico lately that caused the latest update.

Regards
Alan

wobbler’s picture

Alan, There isn't much in the htaccess file to be honest. Without the Boost module, there isn't anything in the htaccess file that will make any significant difference compared to the overhead of starting php and db connections etc.

The config changes I am suggesting are only significant as the time taken for apache to serve a static html is so low in comparison to generating the page that config tweaking can make a big difference.

wobbler’s picture

Ok, here is what I have now come up with. It is about 20-30reqs/sec better than what I had before, and upto 100+reqs/sec than standard ones from boosted.txt, on my hardware. Performance gains esp. when you look at non-logged in users, for requests for css, image, js etc etc.

        # If the REQUEST_URI ends in . then 2-4 word chars then leave request unchanged,
        # and stop processing rewrite rules. This benefits logged in a non logged in users.
        RewriteRule \.\w{2,4}$ - [L]

        # If DRUPAL_ID cookie is set in request headers skip the next 4 rewrite rules, which
        # goes to the normal rewrite rules for clean urls.
        RewriteCond %{HTTP_COOKIE} DRUPAL_UID
        RewriteRule .? - [S=4]

        # If QUERY_STRING is not blank or REQUEST_METHOD is not GET skip 3 rules.
        RewriteCond %{QUERY_STRING},%{REQUEST_METHOD} !=,GET
        RewriteRule .? - [S=3]

        # Test if the REQUEST_URI html file is in the cache, if so use it.
        RewriteCond %{DOCUMENT_ROOT}/cache/%{SERVER_NAME}/0%{REQUEST_URI}.html -f
        RewriteRule .? cache/%{SERVER_NAME}/0%{REQUEST_URI}.html [L]

        # If REQUEST_URI is / test if index.html exists in cache, if so use it.
        RewriteCond %{DOCUMENT_ROOT}/cache/%{SERVER_NAME}/0/index.html -f
        RewriteRule ^/$ cache/%{SERVER_NAME}/0/index.html [L]

        # Test if REQUEST_URI directory exists with and index.html file, if so serve it.
        RewriteCond %{DOCUMENT_ROOT}/cache/%{SERVER_NAME}/0%{REQUEST_URI}/index.html -f
        RewriteRule .? cache/%{SERVER_NAME}/0%{REQUEST_URI}/index.html [L]

        # Standard clean url rewrite rules.
        RewriteCond %{REQUEST_FILENAME} !-f
        RewriteCond %{REQUEST_FILENAME} !-d
        RewriteRule ^(.+) index.php?q=$1 [L,QSA]

drupdrips’s picture

why not just check if {REQUEST_URI} -f => which should satisfy all the non-generated types such as css, jpg, etc. basically what you are trying to do with ".". Then follow the other conditionals in the order of most restrictive evaluation first .. of the three types, starting with DRUPAL_UID in cookie first (this will immediately fork out the logged in users and let them begin the uphill battle of php invocation and db connections), then Query string eval (even anonymous users can have this) and finally eval of most common request method of GET or not. With this, for most sites, the evaluation is forked at the least travelled path, I 'd think.

creyes123’s picture

The rewritten rules were working great for me until I discovered that the Drupal-generated sitemap.xml file was not working. After much head scratching, I realized that the fix was to add a condition to the first rule that skipped it if the file did not exist on the file system.

        # If the REQUEST_URI ends in . then 2-4 word chars then leave request unchanged,
        # and stop processing rewrite rules. This benefits logged in and non-logged in users.
        # Note: added condition to skip the rule if the file does not exists (needed for sitemap.xml)
        RewriteCond %{REQUEST_URI} -f
        RewriteRule \.\w{2,4}$ - [L]
wobbler’s picture

Drudrips and creyes123 - The original reason I didn't just check for the file existance is that I was trying to take any form of file checking out of the rules unless completely necessary to improve speed. A few CPU cycles required to test a regex are a lot quicker than performing disk seeks searching for a file in directory etc. However if it breaks some drupal functionality then it is obviously no good.

In the update suggested the rewrite rule doesn't need to involve any test, as we are already happy that we want to serve the file. Also I think it should be REQUEST_FILENAME rather than REQUEST_URI

RewriteCond %{REQUEST_FILENAME} -f
RewriteRule .? - [L]
drupdrips’s picture

@ wobbler: I think in both our approaches we agree to check for dot file extensions first and serving it up without needing to do any other REGEXP evals. Since in serving up a single page ALL (.css/.jpg/.png/.gif/.js) but one of the request (the actual page) will be dot extension types there is no need to do regexp evals for these. But I am still of the opinion to check if the file exists or not before trying to serve it up. So I guess what you are saying is if the file is an extension type with a dot preceding 3 to 4 alphabets then you try to serve the file without even checking if the file is there or not ? Is there any risks to doing so ? Since boost did not come with rewrite rules for lighttpd which is what I use, I had to come up with my own .. here : http://drupal.org/node/150909#comment-997460 . I am sensitive to anything that will help improve performance but not sure if not checking for file existence will be a good idea. If you or anyone else has a comment about this pls share.

wobbler’s picture

Your right, not checking for file existance is a risk.

For absolute flat out performance then working with in-memory data is as fast as you can get, you are talking nanoseconds to calculate rather than milliseconds to do a disk seek (an order of 1000 times faster). That is why I initially made the assumption about file extensions, but if it causes stability or functionality issues then the assumption isn't valid.

mikeytown2’s picture

Version: 5.x-1.0 » 6.x-1.x-dev
Status: Active » Needs review

This has been tested with the latest version of boost running with this patch #141954: Gzipping of static html files (*.html.gz)

  # BOOST START
  <IfModule mod_headers.c>
    Header add Expires "Sun, 19 Nov 1978 05:00:00 GMT"
    Header add Cache-Control "store, no-cache, must-revalidate, post-check=0, pre-check=0"
  </IfModule>
  <IfModule mod_mime.c>
    AddCharset utf-8 .html
  </IfModule>
  #serve file IF it exist on server
  RewriteCond %{REQUEST_FILENAME} -f
  RewriteRule .? - [L]
  #skip boost IF not get request OR uri has wrong dir OR cookie is set
  RewriteCond %{REQUEST_METHOD} !^GET$ [OR]
  RewriteCond %{REQUEST_URI} ^/admin|^/cache|^/misc|^/modules|^/sites|^/system|^/themes|^/user/login [OR]
  RewriteCond %{HTTP_COOKIE} DRUPAL_UID
  RewriteRule .? - [S=6]
  #Skip next 3 rules IF uri contains a query string
  RewriteCond %{QUERY_STRING} !^$
  RewriteRule .? - [S=3]

  # root
  RewriteCond %{REQUEST_URI} ^/$
  RewriteCond %{DOCUMENT_ROOT}/cache/%{SERVER_NAME}/index.html -f
  RewriteRule ^(.*)$ cache/%{SERVER_NAME}/index.html [S=3]
  # subdir root
  RewriteCond %{DOCUMENT_ROOT}/cache/%{SERVER_NAME}%{REQUEST_URI} -d
  RewriteCond %{DOCUMENT_ROOT}/cache/%{SERVER_NAME}%{REQUEST_URI}/index.html -f
  RewriteRule ^(.*)$ cache/%{SERVER_NAME}/%{REQUEST_URI}/index.html [S=2]
  # non root
  RewriteCond %{DOCUMENT_ROOT}/cache/%{SERVER_NAME}%{REQUEST_URI}.html -f
  RewriteRule ^(.*)$ cache/%{SERVER_NAME}/%{REQUEST_URI}.html [S=1]
  # url variables
  RewriteCond %{DOCUMENT_ROOT}/cache/%{SERVER_NAME}%{REQUEST_URI}_%{QUERY_STRING}.html -f
  RewriteRule ^(.*)$ cache/%{SERVER_NAME}/%{REQUEST_URI}_%{QUERY_STRING}.html [S=0]

  #serve gziped content if it exists
  <FilesMatch "\.(html.gz)$">
    AddEncoding x-gzip .gz
    ForceType text/html
  </FilesMatch>
  RewriteCond %{HTTP_USER_AGENT} !".*Safari.*"
  RewriteCond %{HTTP:Accept-encoding} gzip
  RewriteCond %{REQUEST_FILENAME}.gz -f
  RewriteRule ^(.*)\.html $1.html.gz [L]
  # BOOST END
mikeytown2’s picture

Status: Needs review » Needs work

These rules need to be revised, if a url should be a POST and it contains a url variable, it doesn't work correctly as it did before.

mikeytown2’s picture

RewriteCond %{REQUEST_METHOD} !^GET$
OR
RewriteCond %{REQUEST_METHOD} !=,GET
OR
RewriteCond %{REQUEST_METHOD} !^GET

???

mikeytown2’s picture

Status: Needs work » Postponed

Going to wait until we get a clear set of rules for each type of install (single, subdir, multi-site) before optimizing more. Right now I will be integrating this at the top of boost's rules as it seems fairly safe and effective.

  #serve file IF it exist on server
  RewriteCond %{REQUEST_FILENAME} \.\w{2,4}$
  RewriteCond %{REQUEST_FILENAME} -f
  RewriteRule .? - [L]
mikeytown2’s picture

Category: bug » feature
mikeytown2’s picture

jumoke’s picture

Can someone help me here please?

I have a multi site setup with drupal in a sub directory on root. so i have:
public_html
--drupal
--newsite (subdomain symlinks to the drupal folder)
--index.html and other static files of root.

Im working this on the subdomain to change to main domain when i go live.

I am using the boosted2.txt setup as pasted below in my newsite/.htaccess, not in the drupal folder. I tried putting the .htaccess boost code in the drupal folder, status report coughs. Now, it's in the newsite/.htaccess and status report is fine. With Cache in the newsite/cache folder, i see the files created in there fine, but does not get served.

Can someone please help me? I have been on this for 2 agonizing days : (

Settings are:

  ### BOOST START ###

  # Gzip Cookie Test
  RewriteRule ^(.*)boost-gzip-cookie-test\.html cache/perm/boost-gzip-cookie-test\.html\.gz [L,T=text/html,E=no-gzip:1]

  # GZIP - Cached css & js files
  RewriteCond %{HTTP_COOKIE} !(boost-gzip)
  RewriteCond %{HTTP:Accept-encoding} !gzip
  RewriteRule .* - [S=2]
  RewriteCond %{DOCUMENT_ROOT}/cache/perm/%{HTTP_HOST}%{REQUEST_URI}_\.css\.gz -s
  RewriteRule .* cache/perm/%{HTTP_HOST}%{REQUEST_URI}_\.css\.gz [L,QSA,T=text/css,E=no-gzip:1]
  RewriteCond %{DOCUMENT_ROOT}/cache/perm/%{HTTP_HOST}%{REQUEST_URI}_\.js\.gz -s
  RewriteRule .* cache/perm/%{HTTP_HOST}%{REQUEST_URI}_\.js\.gz [L,QSA,T=text/javascript,E=no-gzip:1]

  # NORMAL - Cached css & js files
  RewriteCond %{DOCUMENT_ROOT}/cache/perm/%{HTTP_HOST}%{REQUEST_URI}_\.css -s
  RewriteRule .* cache/perm/%{HTTP_HOST}%{REQUEST_URI}_\.css [L,QSA,T=text/css]
  RewriteCond %{DOCUMENT_ROOT}/cache/perm/%{HTTP_HOST}%{REQUEST_URI}_\.js -s
  RewriteRule .* cache/perm/%{HTTP_HOST}%{REQUEST_URI}_\.js [L,QSA,T=text/javascript]

  # Caching for anonymous users
  # Skip boost IF not get request OR uri has wrong dir OR cookie is set OR https request
  RewriteCond %{REQUEST_METHOD} !^(GET|HEAD)$ [OR]
  RewriteCond %{REQUEST_URI} (^/(admin|cache|misc|modules|sites|system|openid|themes|node/add|comment/reply))|(/(edit|user|user/(login|password|register))$) [OR]
  RewriteCond %{HTTP_COOKIE} DRUPAL_UID [OR]
  RewriteCond %{HTTPS} on
  RewriteRule .* - [S=7]

  # GZIP
  RewriteCond %{HTTP_COOKIE} !(boost-gzip)
  RewriteCond %{HTTP:Accept-encoding} !gzip
  RewriteRule .* - [S=3]
  RewriteCond %{DOCUMENT_ROOT}/cache/normal/%{HTTP_HOST}%{REQUEST_URI}_%{QUERY_STRING}\.html\.gz -s
  RewriteRule .* cache/normal/%{HTTP_HOST}%{REQUEST_URI}_%{QUERY_STRING}\.html\.gz [L,T=text/html,E=no-gzip:1]
  RewriteCond %{DOCUMENT_ROOT}/cache/normal/%{HTTP_HOST}%{REQUEST_URI}_%{QUERY_STRING}\.xml\.gz -s
  RewriteRule .* cache/normal/%{HTTP_HOST}%{REQUEST_URI}_%{QUERY_STRING}\.xml\.gz [L,T=text/xml,E=no-gzip:1]
  RewriteCond %{DOCUMENT_ROOT}/cache/normal/%{HTTP_HOST}%{REQUEST_URI}_%{QUERY_STRING}\.json\.gz -s
  RewriteRule .* cache/normal/%{HTTP_HOST}%{REQUEST_URI}_%{QUERY_STRING}\.json\.gz [L,T=text/javascript,E=no-gzip:1]

  # NORMAL
  RewriteCond %{DOCUMENT_ROOT}/cache/normal/%{HTTP_HOST}%{REQUEST_URI}_%{QUERY_STRING}\.html -s
  RewriteRule .* cache/normal/%{HTTP_HOST}%{REQUEST_URI}_%{QUERY_STRING}\.html [L,T=text/html]
  RewriteCond %{DOCUMENT_ROOT}/cache/normal/%{HTTP_HOST}%{REQUEST_URI}_%{QUERY_STRING}\.xml -s
  RewriteRule .* cache/normal/%{HTTP_HOST}%{REQUEST_URI}_%{QUERY_STRING}\.xml [L,T=text/xml]
  RewriteCond %{DOCUMENT_ROOT}/cache/normal/%{HTTP_HOST}%{REQUEST_URI}_%{QUERY_STRING}\.json -s
  RewriteRule .* cache/normal/%{HTTP_HOST}%{REQUEST_URI}_%{QUERY_STRING}\.json [L,T=text/javascript]

  ### BOOST END ###

mikeytown2’s picture

@Excalibur
Please create a new issue