| Project: | Boost |
| Version: | 7.x-1.x-dev |
| Component: | Code |
| Category: | feature request |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | needs review |
Issue Summary
It would be nice if the 7.x port had very basic gzip capability soon.
The ability to have separate path structures like it was possible in D6 wouldn't even be necessary (in the beginning). The default behavior could simply be to have an html.gz file written to the same directory for every html file, if compression is enabled.
If you're using nginx, like we are, that's all it takes to make use of its gzip_static capability. Once you turn on this setting, the server will always check for a .gz version of each file in the same directory and deliver it when the browser request suggest it can handle gz.
After all, this is also the way that the D7 core js/css aggregator is doing it.
I made a very simple mod for myself, by appending the following to boost_exit():
<?php
boost_write_file($_boost['filename'].'.gz', gzencode($data, 9));
?>
Comments
#1
I second this. I'm not sure why this is considered such a low priority... so low that it's marked in the 7.x handbook as "depreciated?"... when so many sites I see use gzip actively... but I find small file sizes really boost my site load speed. I know, shocking. Pardon that sarcasm... but depreciated?
Anyway, it's my understanding that checking for a header and enabling gzip doesn't take much code so I hope someone will add this feature in the near future. I'm going to tackle it myself on my site for my needs, but I don't know what's optimal for boost really (very new to it). So awesome module, can't wait for this "little" feature to come back.
#2
I'm not sure why the handbook mentioned that (it may be my bad). Fixed it to link to this issue.
#3
Can you please tell me in which particular file, you modified the code as above? thanks
#4
If you take a look at the module, you will find that there is only one file ("boost.module") where all the functionality is implemented.
#5
Any patches possible?
#6
Add it to the end of boost_exit(), AND, don't forget to modify your .htaccess file so that \.html in the 'gzip' section actually reads, \.html\.gz
At least, that worked for me.
#7
subscribing
#8
If you want this to move forward, please provide a patch, it saves us maintainers a lot of time in figuring out exactly the details.
It takes just a few minutes:
* clone the git repo:
git clone --recursive --branch 7.x-1.x http://git.drupal.org/project/boost.git
cd boost
* apply change to boost.module file
* git diff > /tmp/boost-gzip-1416214.patch
* attach patch file to this post.
Once we have a basic patch, we can move it forward from that point and solve the small details.
If you're working under Windows and installing git or using the command line is not obvious (although there are GUI solutions, but I don't know them much), just send a copy of your modified boost.module and we will do a diff/patch.
Thanks in advance :)
#9
Inspired on the code from ralf I've created the attached patch.
It creates de gz version on cache folder and changes de htaccess generator.
Take a look please.
Thanks!
(using 7.x-1.x version)
#10
After apply the patch, the boost admin area was gone. I just receive a HTTP Error 500.
Thanks
#11
pmichelazzo that's probably a PHP error, please if you can give us more details it would help! :D
#12
Committed to 7.x-1.x. Thanks for the patch!
Added the following:
* in admin/config/system/boost, added an option to disable gzip compression (enabled by default).
* compress only if the settings say so.
TODO:
* please test!
* test whether the cron will delete/expire the gz files as well.
* test 'expire' hook to see if it expires gz files.
#13
There's problems with this update/ patch, if installed and updated
#14
I think this patch need to be removed until further investigation. Tests enabling and disabling mod_mime indicate that something is not quite correct, mod_headers is disabled for the test. ForceType is therefore serving zip files to the browser as text
tested on two files where the uncompressed version was removed to confirm that only the gz file was served, files are 3494 and 3793 bytes respectively.
with mode_mime (serves a web page)
Accept-Ranges bytes
Cache-Control max-age=5
Connection Keep-Alive
Content-Encoding gzip
Content-Length 3793
Content-Type text/html; charset=utf-8
Date Wed, 31 Oct 2012 08:24:14 GMT
Expires Wed, 31 Oct 2012 08:24:19 GMT
Keep-Alive timeout=5, max=100
Last-Modified Wed, 31 Oct 2012 08:07:28 GMT
Server Apache/2.2.22 (Ubuntu)
without (serves garbled zipped data that is not decompressed)
Accept-Ranges bytes
Cache-Control max-age=5
Connection Keep-Alive
Content-Encoding gzip
Content-Length 3816
Content-Type text/html; charset=utf-8
Date Wed, 31 Oct 2012 08:24:40 GMT
Expires Wed, 31 Oct 2012 08:24:45 GMT
Keep-Alive timeout=5, max=100
Last-Modified Wed, 31 Oct 2012 08:07:28 GMT
Server Apache/2.2.22 (Ubuntu)
Vary Accept-Encoding
with mode_mime (serves a web page)
Accept-Ranges bytes
Cache-Control max-age=5
Connection Keep-Alive
Content-Encoding gzip
Content-Length 3494
Content-Type text/html; charset=utf-8
Date Wed, 31 Oct 2012 08:28:19 GMT
Expires Wed, 31 Oct 2012 08:28:24 GMT
Keep-Alive timeout=5, max=100
Last-Modified Wed, 31 Oct 2012 08:03:22 GMT
Server Apache/2.2.22 (Ubuntu)
without (serves garbled zipped data that is not decompressed)
Accept-Ranges bytes
Cache-Control max-age=5
Connection Keep-Alive
Content-Encoding gzip
Content-Length 3517
Content-Type text/html; charset=utf-8
Date Wed, 31 Oct 2012 08:28:53 GMT
Expires Wed, 31 Oct 2012 08:28:58 GMT
Keep-Alive timeout=5, max=100
Last-Modified Wed, 31 Oct 2012 08:03:22 GMT
Server Apache/2.2.22 (Ubuntu)
Vary Accept-Encoding
#15
Checking this through and I continually have the forcetype directive being applied and then no decompression taking place.
#16
Thanks for the quick feedback. I commented out temporarily the "gzencode" line in boost.module until we have more tests done.
Still requires however to clear the boost cache to delete the .gz files.
Were the rules in your main .htaccess (in the Drupal root directory) updated? (yep, I had forgotten to mention this rather important detail). -- we will need a way to remind people upgrading that their htaccess needs to be updated.
There's also the fact that cache is not expired by cron/expire, I presume, so gzip could show old content.
If it can help debugging, the most important rules for gzip are these:
# GZIP
RewriteCond %{HTTP:Accept-encoding} !gzip
RewriteRule .* - [S=1]
RewriteCond %{DOCUMENT_ROOT}/cache/%{ENV:boostpath}/%{SERVER_NAME}%{REQUEST_URI}_%{QUERY_STRING}\.html\.gz -s
RewriteRule .* cache/%{ENV:boostpath}/%{SERVER_NAME}%{REQUEST_URI}_%{QUERY_STRING}\.html\.gz [L,T=text/html,E=no-gzip:1]
So the first part should read: "if client does not support gzip, skip the next rule".
The second part has "E=no-gzip:1", which disables mod_deflate to avoid double-compression.
Although, to be honest, I haven't dived into mod_rewrite in a while, and most of this was already implemented by mikeytown2, I only added the .gz suffix.
fwiw, I often use "wget" with the "-S" option (show response headers) to do tests and see headers:
wget -S -O /dev/null http://example.org/Or to request a page with gzip enabled:
wget -S --header="Accept-Encoding: gzip" -O index.html.gz http://example.org/(or rename the resulting page to foo.gz, so that it's then possible to open with "vim" or gunzip the file)
#17
Yes .htaccess files were updated and I also manually deleted the unzipped files from the cache to make sure that only the cached .gz files were being served which is when I spotted the errors. Because I'd been testing out the .htaccess for mod_mime and mod_headers for my previous patches I got junk or a file to download as my test server by chance had them disabled. Possibly there could be a test to fix this, but it would involve a lot of servers and a series of long winded if statements to decide if the ForceType directive was placed inside the if mod_headers... directive. The issue appears to stem from apache itself and the ForceType stately definitively that this a text file even though gzip headers are being sent and altering the bytes sent by a fractional amount.
#18
As per bgm suggestion on http://drupal.org/node/1888736#comment-6939904
I installed latest dev version of boost 7.x from 01/07/2013 date.
After uncommenting the line at 330, I could see .gzip files generated in cache/normal/... folder.
But when I tried fetching them in other browsers (FF & Safari) as anonymous user, I didn't see <---- boost comment line at the end indicating that page delivered is a cached paged --->
Secondly, how do I make sure that gzip version of file is delivered to http request?
I can do some more tests if you want.
#19
Silly question, but have you altered your .htaccess file ?
#20
Not at all silly because I missed it. :)
I kept .htaccess settings from my earlier boost installation.
I am using localhost. Do you think will it still matter to change .htaccess settings?
#21
Yes and you'll need to check your RewriteBase setting.
#22
Hi,
I added .htaccess changes generated from default settings under boost.
Didn't have to change any RewriteRule.
I see .gz file generated in cache. Also tested it on pageSpeed. I observed 2 things:
1. In FF, if I enabler Cache, page speed rank is ~78/100. Under Resouces tab, it shows File size = 48k and Transfer size = 8k.
2. In FF, if I disable all Caches, page speed score increases to 98/100. Same values under Resource tab.
Can anybody explain why it's giving better results without browser cache?
#23
#24
Page speed should have a list items as to it's score, but it's not boost relevant, part from that you may have boost working under chrome.
#25
I didn't test it under Chrome.
But doing it on Safari now.
Will keep you all posted.
#26
Similar functioning on Safari as well.
Clear cache under performance page, is cleaning all caches which is good.
I am still not sure if this is ready for production site yet.
#27
Normally the boost clear cache is split out from the performance page by ticking the box in boost under the cache expiration tab.
#28
Though I noticed an unusual behavior.
After boost caching turned ON and sign off , close the browser & then restart the browser.
Clear browser cookies, cache.
So now there's no cache generated yet in boost and browser is also clean.
Now, if start opening pages on site, it caches 2 pages only and rest of them are ignored irrespective of how many times you try, or even on different browser.
All this time, I was browsing as anonymous user but still no cache pages are created under boost dirs.
The only change I did apart from Boost is that I turned Blog comments On. But I haven't visited Blog content yet as anonymous user.
Using FF and Safari. Not sure if this is Boost related or something else.
#29
almost certainly something has trigged the DRUPAL_UID cookie and you should check for it, that will disable both page generation and serving of cached pages and is probably a module assigning you a user id even if anonymous.
#30
Do not see DRUPAL_UID cookie set for anonymous user. As soon as I login, I see it and goes off when I log off. I don't think this is the root of above problem.
Besides, this I just installed HTTPRL & Cache Expire & enabled Boost Crawler.
As per instructions, on Cron run it should generate cache but it doesn't seem to be doing it.
I also tried changing a node and saving it, then running manual cron but no luck.
Is this a known issue?
#31
No the crawler doesn't generate a cache ever, it only crawls for pages that have been edited.
#32
You mean it will only crawl pages who's visible content is changed or we can just click edit and save without modifying actual content?
Does crawler checksum old and new file for differences?
Moreover normal boost isn't caching user/login & user/register pages. I tried many times but it's not.
Is it intended functionality or a bug?
#33
The crawler is hooked into most of the functions that edit a page, like posting a comment, deleting content etc... so there is no need for a checksum. Upon any of these action there is an entry in the cron queue table and httprl is used to generate it as an anonymous user.
#34
from the boost .htaccess rules
RewriteCond %{REQUEST_URI} (^/(admin|cache|misc|modules|sites|system|openid|themes|node/add|comment/reply))|(/(edit|user|user/(login|password|register))$) [OR]It's intentional. Wouldn't want a cached version with someone's username popping up, if by some coincidence the cache was built as they entered the wrong password.
#35
Ah, I see, this totally makes sense. Thanks.
#36
I faced an unusual issue today with Boost.
I have last recommended release installed with BOTCHA 7.14x
There are almost hundred files starting as "boostMxseY", "boostYgRkt", etc.
Any idea what could be causing it?
#37
Never used the module myself, but what's in the files ? Having a read through it appear to be javascript spambot protection for forms and I'm guessing there's some element of xml or ajax requests being made, as well as that if it's adding unique fields to forms that differ from the normal one's that are ignored (like login/ registration), then there's quite a high possibility that the form will not work (BOTCHA's description appears to say that it will ignore duplicate form registrations and boost could easily cache a comment/ login on every page form which we've had here before #1616356: Boost Module problem in logged in user and the need to use a register link to post comments, to bypass this).
This should have been a new thread.
#38
The problem with junk BoostZWRse files is resolved here http://drupal.org/node/1890836#comment-6957440.
It was an issue with Botcha module
Thanks.
#39
I installed the latest Dev version, uncommented line 330 in boost.module and set the .htaccess rules. But when I test with Pagespeed Insights in Chrome it's still complaining about uncompressed javascripts and CSS files.
Am I missing something?
Cheers,
Bram.
#40
p.s. You can see for yourself at http://intermin.bramdeleeuw.nl/
#41
Boost would not compress js or css, that would be your apache (or whichever server) setting you are using as it's not running through php. The gzipped part of the module had to be commented out because it only worked in certain server configurations based on the modules loaded and so broke a few browsers including chrome and resulted in blank pages as the first few bytes of the zip file were not sent out. I did the testing and could not find a solution around the issue as it's server based although I may re-evaluate it for apache 2.4 when it becomes more common place.
#42
ah, appearantly I misunderstood. So I should comment-out line 330 again?
#43
If your particular server is working then there is no need to comment it out as long as it functions cross browser, but it applies only to html files and you will need to look at your rewrite rules and the logs to check the the pages are being served correctly. To compress javascript and css files I suggest that you examine mod_deflate for apache and you can achieve compression without boost for html files although at a slight increase in cpu usage. You need to pay attention to the configuration as otherwise it will try and compress jpgs and pngs and other files which do not compress well.
#44
Thanks Philip for your info. I will get into it!
#45
@Philip_Clarke: I've reproduced this issue, especially your finds from #14. It is correct that the current generated htaccess rules depend on
mod_mimeto be present in the apache installation.When I'm not mistaken
mod_mimeis very important for most setups. I cannot think of any case where this module needs to be disabled, rather I think that an apache-configuration without this module will result in all sorts of trouble. IMO we could count on this module being enabled by default.That said I suggest to just document the issue (i.e. serving compressed content with apache requires mod_mime). Also it would be possible to move the
AddEncoding gzip .gzdirective out of theIfModulecondition in the cache.htaccess. That way users would get error messages when things are not configured properly instead of a gzip-data displayed in a browser window.#46
Another solution would be to do what core does and implement the same checks like core
.htaccessbased onmod_headers.#47
In the current dev version gzipping was disabled because of the "we can't predict which modules would be installed across many servers", now that you've confirmed the modules, I suggest we just throw the idea away. I've tried various configurations and differing .htaccess set ups with ifModule statements but removed them all because it's an apache module issue rather than boost.
Lighttpd and Nginx can do their own gzip caching so they reduce cpu cycles, it's easy enough to turn gzip on in .htaccess for php, html, js and css files for apache (though at the cost of more cpu cycles and the possibility of making an error with the configuration and trying to recompress jpegs), so I don't believe that gzipping has a place in boost any more because of the module problem, better that boost works consistently across differing servers than trying to force it to work with apache for a reduction in cpu cycles that appears to constantly put delivery of pages to chrome, in particular at risk. It's a substantial amount of server and browser share that is at risk of corrupted delivery or blank pages, not because of "boost" but because of the variance in apache set ups and I just cannot see why we should bend over backwards to compensate for a bug in apache and then field support requests for why boost is not zipping files when a few lines in .htaccess would work and increase delivery speed for other resources on a site.
#48
The drupal-core approach is not too difficult. I propose the following solution:
This is
.htaccessin drupal root. Note that we simply set the variableboosttrygzipwhen both of the following conditions are met: the server hasmod_headersand the client requests gzip.### BOOST START ###
RewriteRule .* - [E=boosttrygzip:0]
<IfModule mod_headers.c>
# Try to serve gzip compressed files if they exist and the client accepts
# gzip.
RewriteCond %{HTTP:Accept-encoding} gzip
RewriteRule .* - [E=boosttrygzip:1]
</IfModule>
# Allow for alt paths to be set via htaccess rules; allows for cached variants (future mobile support)
RewriteRule .* - [E=boostpath:normal]
# Caching for anonymous users
# Skip boost IF not get request OR uri has wrong dir OR cookie is set OR request came from this server OR https request
RewriteCond %{REQUEST_METHOD} !^(GET|HEAD)$ [OR]
RewriteCond %{REQUEST_URI} (^/(admin|cache|misc|modules|sites|system|openid|themes|node/add|comment/reply))|(/(edit|user|user/(login|password|register))$) [OR]
RewriteCond %{HTTPS} on [OR]
RewriteCond %{HTTP_COOKIE} DRUPAL_UID [OR]
RewriteCond %{ENV:REDIRECT_STATUS} 200
RewriteRule .* - [S=3]
# GZIP
RewriteCond %{ENV:boosttrygzip} 0
RewriteRule .* - [S=1]
RewriteCond %{DOCUMENT_ROOT}/cache/%{ENV:boostpath}/localhost%{REQUEST_URI}_%{QUERY_STRING}\.html\.gz -s
RewriteRule .* cache/%{ENV:boostpath}/localhost%{REQUEST_URI}_%{QUERY_STRING}\.html\.gz [L,T=text/html,E=no-gzip:1]
# NORMAL
RewriteCond %{DOCUMENT_ROOT}/cache/%{ENV:boostpath}/localhost%{REQUEST_URI}_%{QUERY_STRING}\.html -s
RewriteRule .* cache/%{ENV:boostpath}/localhost%{REQUEST_URI}_%{QUERY_STRING}\.html [L,T=text/html]
### BOOST END ###
The cache-
.htaccessthen fixes content-encoding and vary headers when compressed content is to be delivered:AddDefaultCharset utf-8
FileETag MTime Size
<FilesMatch "\.(html)(\.gz)?$">
<IfModule mod_expires.c>
ExpiresDefault A5
</IfModule>
<IfModule mod_headers.c>
Header set Expires "Sun, 19 Nov 1978 05:00:00 GMT"
Header unset Last-Modified
Header append Vary Accept-Encoding
Header set Cache-Control "no-store, no-cache, must-revalidate, post-check=0, pre-check=0"
Header set X-Cached-By "Boost"
</IfModule>
</FilesMatch>
<IfModule mod_mime.c>
AddCharset utf-8 .html
</IfModule>
<FilesMatch "\.html(\.gz)?$">
ForceType text/html
</FilesMatch>
<IfModule mod_headers.c>
<FilesMatch "\.gz$">
Header set Content-Encoding gzip
Header append Vary Accept-Encoding
</FilesMatch>
</IfModule>
SetHandler Drupal_Security_Do_Not_Remove_See_SA_2006_006
Options None
Options +FollowSymLinks
This setup works with any combination of
mod_mimeandmod_header. If the latter is missing, no gzip-content will be served though, but that's exactly what drupal core is doing anyway.#49
I'll have to test that later, but I assume that you've tested using chrome ? as that gave the main blank page problem. Also how are you gzipping the files ? have you uncommented the section out from the last dev build.
#50
Yes.
Yes, patch attached.
#51
#52
I've checked this and I'm afraid it does not work. The good news is that the original bug report I made about gzip not working with mod_headers disabled #1416214-14: Basic gzip support for 7x seems to have been solved in either chrome or apache 2.22 (although that does not mean that the bug would not reoccur in real life across many servers).
I think there is a very simple solution to this as the mod_mime bug is still very much apparent in the latest version of firefox and chrome, which is simply to disable gzip serving in the .htaccess if neither module is present, which I am currently testing.
@zernol it may be that your FORCE_GZIP is the key to the mod_headers issue, the other way around the mod_mime issue may be to add your ForceType directive but this may interfere with the Content-type header. Sorry for the delay, I have a lot of tests to run on various configurations and not being able to reproduce the original mod_headers issue has been a large delaying factor.
#53
I've decide to reverse the logic, if mod_mime or mod_headers is not installed then disable. I've also commented the sections and the reason why. The only other modification was to remove Vary Encoding as it appears in the
<IfModule mod_headers.c>code block above regardless as to whether zipped encoding is chosen or not.
#54
@Philip_Clarke, thank you for the review.
The reason that Chrome chokes on gzip content when
mod_headersis disabled is not a bug in Chrome or Apache. Ifmod_headersis missing,Content-Encodingcannot be added to the response and the browser is not notified that it needs to decompress the data before attempting to render it. Drupal core.htaccessimplements the exact same check when determining whether gzip-compressed CSS and JavaScript files should be served to the browser - i.e. check whether the browser accepts gzip and also check whethermod_headersis present. Only when both of these conditions are met, an attempt to serve compressed content (with appropriate headers) is made. If there were bugs in Chrome / Apache in this area, Drupal core would be affected too. I'm sure that the patch fixes the issue. IMO there is not much risk that this problem will crop up again on production systems when this fix gets in.Oh, good catch. It is actually very important that this header is added, regardless of whether the content is compressed or not.
I think the patch from #53 should get in.
#55
I've submitted it to bgm to go into dev or production, I'll be adding some documentation to explain that if mod_mime or headers is not installed then gzip disables.
Thank you for pointing us in the right direction, I'd got too bogged down in testing across so many set ups and browsers that I could not see the wood for the trees.