After reading this:
http://www.facebook.com/note.php?note_id=307069903919
It made me realize that the Drupal core behavior of aggregating libraries jQuery (and in D7 jQuery UI) together with all the other JS is potentially a significant performance suck for Drupal core. The reason - if I change one byte of JS that's in the aggergated JS (or add a new query string), all users have to re-download jQuery and jQuery UI again. Further if my home page and inner page have slightly different JS aggregated together, I end up downloading the libraries multiple times in aggregated form.
I propose that JS added as a library should (at least by default) never be aggregated and not get the query string added. We must, though, put the library version number in the file name.
Current status
The original issue has already been sorted, and the current patch merely changes variable names for consistency with core and sets the default stale file threshold to three days.
Comment | File | Size | Author |
---|---|---|---|
#213 | 721400-variable-names-213.patch | 4.46 KB | mcjim |
#211 | 721400-variable-names-211.patch | 3.06 KB | mcjim |
#210 | 721400-variable-names-210.patch | 2.91 KB | dstol |
#191 | 721400-variable-names-158-retest.patch | 2.99 KB | bleen |
#189 | 721400-variable-names-158-retest.txt | 2.99 KB | tom_o_t |
Comments
Comment #1
bleen CreditAttribution: bleen commentedAssuming you agree with the premise of this issue (which I'm not sure I do yet...maybe) wouldn't it be better to aggregate jQuery & jQueryUI (and other js added as a library) into one aggregated file and then all the contrib/theme js into a second aggregated file ...
Comment #2
catchAlso if I read that article right, any inline js which only depends on jquery as opposed to other files which are aggregated would be able to run instantly if jquery itself is cached by the browser. So it's not just the extra downloads, it'd also be rendering performance.
i don't think we add Jquery UI on every single page, at least I hope not, so that should be either aggregated with the code that uses it, or also not aggregated, but not lumped with jquery IMO.
Comment #3
pwolanin CreditAttribution: pwolanin commentedBad performance is a bug.
Comment #4
pwolanin CreditAttribution: pwolanin commentedI'll try to roll a patch tonight - I think the simplest approach is to make any library JS non-aggregated.
Comment #5
pwolanin CreditAttribution: pwolanin commentedHere's a fist pass - tries to use the information about version included in hook_library().
Comment #6
pwolanin CreditAttribution: pwolanin commentedThis patch is a little better - but it still kills the overlay.
Also - I notice the JS and CSS aggregation logic works differently for no readily apparent reason.
Comment #7
pwolanin CreditAttribution: pwolanin commentedok, we avoid killing the overlay by making drupal.js also type 'library'
Comment #9
catchApproach looks solid to me. We originally added the query string at the very end of the D6 cycle because that was the first time we'd upgraded a jQuery version in core and people were getting cached javascript on update.php. However that was a very aggressive approach and we knew that at the time. Appending the version string to the file name fixes the same root cause, without needlessly causing refreshes.
With #attached_js and #attached_css, there's going to be a lot less of modules adding their stuff in hook_init() like in D6 - i.e. you have to do that for blocks, for example and more loading as and when things are needed.
Comment #10
cbrookins CreditAttribution: cbrookins commentedSubscribe
Comment #11
pwolanin CreditAttribution: pwolanin commentedThinking more about this - why do we alter the last letter of the aggregated file names? This makes no sense. The file name will change if the content changes. The current system just needlessly inhibits the desired caching.
Comment #12
pwolanin CreditAttribution: pwolanin commentedOk - I see now, we don't actually MD5 the contents, just the info array contents when constructing the file name. Still, I think we can prevent failed requests via cached pages by using a query string for the aggregated files too, rather than changing their names.
Comment #13
pwolanin CreditAttribution: pwolanin commentedfix title
Comment #14
bleen CreditAttribution: bleen commentedjust a thought ... (and I recognize the extreme rareness of this) but without the query string, what happens if there is a patch to Drupal core that includes a new version of jQuery (lets say their was a security release for jq for the sake of argument)... in that case wont we need the query string to ensure proper cache busting?
(I also realize that JQ might be a bad example because there will likely be a new file name in that case, but I think my question is clear)
Comment #15
pwolanin CreditAttribution: pwolanin commentedDid you try the patch? It does add a query string which is the jQuery version as returned from hook_library() - for exactly the reason you suggest. What I want to avoid is the aggregation and/or random query string that force re-downloading an unchanged jQuery library too often.
Comment #16
kkaefer CreditAttribution: kkaefer commentedComment #17
pwolanin CreditAttribution: pwolanin commenteddrupal.js is loaded on every page where we have JS and is required for the other files to function. In that sense it's a library. It would possibly be preferable to add a version query string instead of the random query string, but that's less important since it's not a huge file.
I switched to the two character query string to increase the cycle time - in particular since it's being added to the preprocessed JS file instead of altering the filename of the preprocessed file. It's absolutely necessary to add the query string to the preprocessed file since the filename hash is only based on things like file name that do not change if the contents change.
Comment #18
RobLoachI like the idea of using the version number of the library instead of a random character for the query string. Just want to be careful not to make a crazy amount of HTTP requests. jquery.js, drupal.js and jquery.once.js could be aggregated together just fine as they are added to every page with JavaScript.
Comment #19
pwolanin CreditAttribution: pwolanin commentednote - there is a conflict due to recent commits (no PIFR retesting?). I will try to reroll soon.
Also - note that Drupal 7 has instructions in .htaccess to set 2 week expire headers for all static content if possible:
Comment #20
pwolanin CreditAttribution: pwolanin commentedNarrowing the scope of the patch here, since Rob pointed out that it's not really a clear win to not aggregate each and every library-type JS file.
Comment #21
pwolanin CreditAttribution: pwolanin commentednote - WP already uses this method to control the refreshing of JS and CSS - e.g.:
Comment #22
pwolanin CreditAttribution: pwolanin commentedWe ought to add a version for drupal.js too - note that this number will need to be bumped whenever it's changed, so it will not be the same as the core version.
Comment #23
pwolanin CreditAttribution: pwolanin commentedOops - had an error of re-using a variable name in a loop that was also a function param.
Comment #24
mfer CreditAttribution: mfer commentedThe performance issue with the way we do things has been known about for some time. IMHO, with jquery ui added to the mix more often this issue will become more noticeable in D7. Trying to come up with better caching strategies is where modules like sf cache came from.
Dealing with this issue came up with earlier in the D7 dev cycle but a better default solution for Drupal could not be agreed on. I had hoped to make the preprocessing pluggable but that issue was pushed off to D8.
All that background context being said here are some specifics to the current patch...
This is a change in the expected behavior. Drupal expects to have teh preprocessed files before the non-preprocessed files at all times. Is this change in expected behavior allowed at this point?
This could be simplified with:
Should the jQuery UI library elements be preprocessed? This is the area that could cause some bloat in preprocessed files.
A nice side effect of the changes here would be that jQuery could easily be swapped out for its version from Google or Microsofts CDN.
This is just a first pass at the patch. I didn't have time to install and test it. I wanted to get something up first.
139 critical left. Go review some!
Comment #25
pwolanin CreditAttribution: pwolanin commentedWe must change the aggregation/ordering behavior to make any progress on this since we need jquery.js and drupal.js to come before the aggregated JS.
Comment #26
sunHow do we handle JavaScript file weights in here? Does it really make sense to special-case jquery.js + drupal.js only? Any benchmarks?
Don't we have all necessary capabilities in place to allow this optimization from contrib?
IMHO, the proper solution is to build "bundles" of page requisites, like http://drupal.org/project/sf_cache does.
That said, I also like the idea of appending the version number - as long as the added files are derived from hook_library(). However, when doing that, there also needs a way to disable this functionality.
In the end, this rather sounds like D8 material for me.
Comment #27
pwolanin CreditAttribution: pwolanin commented@sun - look at the code. There is no special casing at all. They are just marked as non-aggregated. Instead the code adapts from what we use (or used) for CSS aggregation so that we can fully respect the specified weight.
I agree that sf_cache is a more "ideal" solution for people knowledgeable enough to tune those knobs - but this is frankly a pretty trivial change to improve Drupal performance out of the box.
Comment #28
mfer CreditAttribution: mfer commentedI agree with @sun that grouping is the most ideal solution. Having a default grouping for core js would be a place to start. jquery.js, drupal.js, and jquery.once.js could be in one bundle.
@pwolanin do you have any benchmarks?
Typically, the slowest part of loading JavaScript is the blocking in some browsers. That is when one js file is being downloaded the other js files are not acted upon. Most modern browsers do not have this problem but some of the browsers drupal (or more directly, the people who build sites with Drupal) supports do.
So, by moving to 3 non-preprocessed files there is a performance loss when those are loaded the first time. On sites with many returning visitors that is ok. They take the hit on the first page request and then don't later. But, site with a high percentage of new visitors the performance hit for blocking would show up often.
So, we would have to make some assumptions about the default Drupal usage in deciding this issue.
Comment #29
pwolanin CreditAttribution: pwolanin commentedI was trying to avoid the much more substantial changes needed to do grouping. An alternative variant of this patch would just leave jquery out of aggregation.
Comment #30
mfer CreditAttribution: mfer commentedIf we leave jQuery out of preprocessing and order it correctly site builders can use jquery from Google or Microsofts CDN. This is a bonus.
One question is, how do we handle it when someone weights a non-jquery.js external file in the middle of preprocessed files? Does this break up the preprocessed grouping into two of them? is jquery.js special cased? If so, what if someone wants to include prototype or dojo? Is there a way to special case them as well?
I don't mean to be a pain or to even take credit for these questions. They are the questions or type of question asked every time this issue comes up.
Comment #31
sunI can't help, but this change bears too many questions to qualify as performance quick-fix, sorry. Additionally, there are already contributed modules that easily allow you to do this and more. We need a discussion about what's a library and what's not, a discussion about proper grouping (the same applies to CSS files) that depends on the library discussion, a discussion about proper query strings, depending on the result of the library and grouping discussions. Ideally, we should try to prepare sf_cache or another module in a joint effort, to move its base functionality with sane defaults into core for D8, while the contrib module can stay for advanced users to tweak the defaults when needed.
Comment #32
pwolanin CreditAttribution: pwolanin commentedI disagree - this code is broken and needs to be fixed in core to at least behave reasonably. The proposed patch is pretty minor code change, and really affects no APIs.
Comment #33
pwolanin CreditAttribution: pwolanin commentedRe-roll for: http://drupal.org/node/243251 and to respect weighting of external JS per mfer's suggestion that we need to be able to load something like jquery.js from a CDN. Only jquery.js is excluded by default from aggregation in this patch, not the other 2.
The changes I'm making actually make the behavior of the code more consistent with the doxygen for drupal_add_js()
Comment #34
pwolanin CreditAttribution: pwolanin commentedone more fix for http://drupal.org/node/243251
Comment #36
pwolanin CreditAttribution: pwolanin commentedLooks like testbot or HEAD is broken?
Comment #37
pwolanin CreditAttribution: pwolanin commented#34: 721400-no-aggregate-jquery-34.patch queued for re-testing.
Comment #38
bgm CreditAttribution: bgm commentedsubscribing
This is, imho, an important performance issue which has often caused me problems in D6. Redownloading a big CSS or JS file, just because of a small change in it (between two pages who include different CSS/js files), can be a major annoyance for users.
Comment #39
pwolanin CreditAttribution: pwolanin commentedfixing title.
Note that changing the filenames of aggregated files is a real failure if you have a page cache or external (http) cache that may serve a page that references CSS and JS files that have magically disappeared in the background.
Comment #40
pwolanin CreditAttribution: pwolanin commentedComment #41
catchI've seen problems with filename changes before, serving a 404 is a lot worse than serving a stale file. Also just because sfcache offers a more finely tuned contrib solution doesn't mean we shouldn't try to improve what core does, and the CDN trick is a nice touch. Don't have time to do a review of the latest patch at the moment but previous ones seemed pretty sane.
Comment #42
pwolanin CreditAttribution: pwolanin commentedDavid Strauss commented on this linked issue: http://drupal.org/node/243251#comment-2680098 that the query string should not interfere with mod_deflate if it's properly configured.
Comment #43
mfer CreditAttribution: mfer commentedHow you aggregate js files depends on project context. 2 examples to illistrate.
A blog with over 80% of its traffic coming from new people is common. The typical js bundle would include the js for commenting and maybe a few interface things. One bundle that includes jQuery could be very good here because of the blocking issue that many browsers have.
By blocking I mean that on these sites, if they had 2 js files, the browser would request the first file, it would transfer it back, and then execute it. Then it would request the second file, transfer that, and execute it. The time spent talking on the network to manage two files would be longer than the time to download one larger file.
For caching to be of a benefit you need to assume characteristics about the usage of the site.
A site that people regularly return to that has different JS file combinations on different pages could benefit from some bundling/separation. I suspect gardens falls into this. There would be additional blocking but the caching / grouping strategies would work best if they were tailored to the site.
IMHO the optimal solution would be a pluggable preprocess system. Drupal would provide a good default. Then site builders could swap that out for something tailored to their context. This isn't going to happen before D8.
The problem of this being a performance burden in D7 is going to need a little justification and testing. What site context is it a performance burden to based on what style of site?
Comment #44
mbutcher CreditAttribution: mbutcher commentedmfer is 100% correct.
The only right way of doing this right it to modify the drupal_*_js code to have a pluggable backend. The standard medium-sized site can certainly get away with standard aggregation. Meanwhile, those of us who have special requirements can build more sophisticated solutions (which don't clutter core with tuning for out-of-the-norm usage patterns).
Sun rightly pointed to a way that is much more appropriate for large sites that may deploy certain chunks on 90%+ of the site, while other scripts only get served on a small fraction of pages. Most of us want actual control over how those chunks get composed ('cuz we actually know which files are being requested at what frequency). A layer that purports to give us "what we want" while actually not giving much control at all is just frustrating.
Further, I'd suggest that if this is all done "in the name of optimization," somebody should supply some comparative numbers -- and base them on actual use cases, rather than assumptions about how JavaScript files *might* be distributed. Nothing speaks louder than numbers when talking about performance.
We did a huge amount of testing on our site, and clearly arrived at the conclusion that a single minified JS file was more efficient by a long shot. We went from four JS files down to one and effectively cut off 200+ msec of download time off of the average page load, and (of course) freed up web server resources. I strongly suspect that other use cases will be different, depending on how the JS is distributed over the site -- and that fact on its own implies that a single built-in solution is going to be sub-optimal.
Thus, if we are really talking about optimization, solution #1 should be to allow people to determine what they want to optimize and give them the freedom to write code to do that. Look at how that played out with caching, and you will see what appears to be a huge success: there are at least half-a-dozen successful cache modules out there, and probably countless custom ones.
Comment #45
EvanDonovan CreditAttribution: EvanDonovan commentedmbutcher makes a good point on the performance optimization being different for different sites. On my D6 site, I hate how many different css & js aggregated files there are, but we have many (probably too many) modules enabled.
I think that the best thing to do, since the performance issue is debateable, would be to put that in a different patch, and possibly push that to D8 (and Pressflow?).
But I think that keeping the aggregated filenames the same and changing on the basis of a query string is necessary for reverse proxy support. So that part is a bug fix and should definitely go in D7. I didn't actually realize at first that both those things were happening in this patch, because I was confused by the discussion and the fact that the API for adding JS has changed a lot from what I was familiar with in D6.
Comment #46
pwolanin CreditAttribution: pwolanin commentedDiscussing this in IRC with mfer - the block on getting/processing JS files is an issue in older IE (IE6/7).
Much of the reasoning about aggregating jquery.js or not by default depends on the assumed visitor profile and behavior. If they visit 2 pages with different JS, the 2nd page will be faster if jquery is excluded. If they are using IE6, the 1st page may be slower. In FF, etc, possibly all pages will be faster.
I think this is still pretty much rtbc - but agreed with mfer to split the 1 line de-aggregating jquery by default from the rest of this patch which is pretty much a pure bug fix.
Comment #47
pwolanin CreditAttribution: pwolanin commented@mbutcher - a pluggable backend isn't' happening for D7, but maybe you can outline what behaviors/files you cannot change as desired using the existing D7 _alter hooks?
Comment #48
pwolanin CreditAttribution: pwolanin commentedHere's the patch with the one line changing jquery aggregation removed - it does 4 things:
Comment #49
pwolanin CreditAttribution: pwolanin commentedFollow-up issue for de-aggregating jquery: #734080: By default, don't aggregate jquery.js
Comment #50
sunIf there is a technical reason for why we change this from 1 to 2 query string characters, then we are missing important documentation here.
If there is none, this change needs to be reverted.
Typo in "insead"
$query_string_separator already holds "?" or "&", so the additional "?" can only be wrong.
If this patch passed the tests, then we need tests for this functionality.
Why do we append a "\n" to one but not the other processed element?
This is not the same. It means that any call to drupal_get_library() will pollute and bloat the statically cached $libraries, regardless of whether $module exists or not.
API change?
148 critical left. Go review some!
Comment #51
pwolanin CreditAttribution: pwolanin commentedCurrently we can potentially cycle back to the same 1-char query string after ~20 times the cache is cleared and potentially some users may return and their browsers fail to refresh the files. Increasing to 2 characters greatly increases this to > 400.
re:
I know they are not the same - see the other changes in that function which always sets the value to at least an empty array instead of allowing it to sometimes be NULL.
The change simply fixes that line which was incorrect (see all the other entries in that function).
The other points suggest valid corrections that are neede.
Comment #52
pwolanin CreditAttribution: pwolanin commentedI think this corrects the problems.
Comment #54
pwolanin CreditAttribution: pwolanin commented#52: 721400-css-js-aggregation-fixes-52.patch queued for re-testing.
Comment #55
pwolanin CreditAttribution: pwolanin commentedThe failure is in the contact module test - unrelated to the patch.
Comment #56
mfer CreditAttribution: mfer commentedA summary of what I see the patch doing:
With regard to #1, we need to make an assumption here. That the js file we loading does not pull an argument from the url with the same name as the argument we are using to handle force the file to reload. If we purse this we need to allow the variable name to be swappable in case a site has problem.
With regard to #2, why are we trying to preserve the weighted order between preprocessed and non-preprocessed files? The current expected functionality is that weight is preserved within the groupings. Having multiple preprocessed files will have a negative performance impact for many browsers because of blocking. Can you, provide, a justification for this change?
With the exception of the variable change to #1 I am fine with the code. Its the justification for #2 I am still unclear on.
Comment #57
pwolanin CreditAttribution: pwolanin commentedRe: #2 - this change makes the behavior of the code match the documented behavior in the code comments - so I consider it a bug fix.
Also for #2 - Drupal 7 is supposed to support delivering files from a CDN or separate domain. Without this change, I can't ever deliver a file like jquery.js (or any other JS file that needs to come at the start) from an external URL.
Comment #58
mfer CreditAttribution: mfer commentedFor a little background on #2 from comment #56.
The proposed change allows for a non-preprocessed file or external file to be weighted differently. This would be useful for pulling jquery.js from a CDN and still keeping it positioned correctly.
The ordering @pwolanin proposes could be generated in contrib using a ModuleName_process_html callback. And this piece of functionality is something I could see as useful.
But, is it a feature change that is allowed at this late stage of the game?
Comment #59
pwolanin CreditAttribution: pwolanin commentedHere's the change suggested by mfer - he clarified in IRC " a js file can grab the arguments (get style) from the url and act on them."
Thus we can't assume that every JS file/library in existence will gracefully accept ?v=X in its URL and we should allow a way for a site to change that behavior to some other query parameter that won't conflict.
Comment #60
mfer CreditAttribution: mfer commentedThe code is RTBC. Whether it should go into D7 is another matter. :)
Comment #61
Dries CreditAttribution: Dries commentedCommitted to CVS HEAD. Thanks.
Comment #62
pwolanin CreditAttribution: pwolanin commentedGreat! I think at least the use of query string versus changing filenames should be backported to 6.x too.
Comment #63
RobLoachFor Drupal 6, it might be better done through http://drupal.org/project/javascript_aggregator and http://drupal.org/project/sf_cache .
Comment #64
sunComment #65
pwolanin CreditAttribution: pwolanin commentedThe small change I'm suggesting should be in 6.x core - it's a bug as it is now.
Comment #66
mfer CreditAttribution: mfer commented@pwolanin - making this change in D6 will affect sites and module that expect/depend the current behavior in D6. A behavior change like this should not happen in core. People will not just be able to update their sites. Some will have to change code. For people who are just applying updates to keep up with security this could be a problem.
This is a features change and not a bug since the bug could not be clearly articulated and shown (with data to support it).
Comment #67
pwolanin CreditAttribution: pwolanin commented@mfer - I don't think you understand what I'm suggesting for D6. Here's a patch. All I want is to not have filenames change for no good reason.
Comment #68
EvanDonovan CreditAttribution: EvanDonovan commentedI like this - didn't realize it would be so easy to fix. Will consider testing this on a Pressflow Drupal 6.16 install in the next few days.
Comment #69
philbar CreditAttribution: philbar commentedComment #70
RobLoachI created a follow up to the code that was just committed: #746676: New grouping messes with JavaScript weight
Comment #71
c960657 CreditAttribution: c960657 commentedThe logic WRT css_js_query_string was changed today in #751002: _drupal_flush_css_js() is too complex. The D6 patch should either incorporate this change or omit the substr+variable_get('css_js_query_string') change completely. I prefer the former.
Comment #72
Owen Barton CreditAttribution: Owen Barton commentedIt is not clear anywhere in this ticket what the actual reasoning is for changing the querystring rather than the filename. If you change the querystring, the browser will redownload the entire file, just as it would if you change the filename. Modern reverse proxies (e.g. Varnish or Squid >3.0) will happily cache the response and invalidate the cache correctly in either case also. Is there some benefit I am missing here?
In addition many proxies (of the ISP/corporate kind, not the reverse kind) will not cache any requests with a querystring in the URL, hence this change appears to actually reduce performance for these users.
Attached is a patch that reverses this specific change (but leaves all the other good stuff, such as JS versioning and the CSS order fixes), but I would like to hear back a bit more about the original reasoning before putting this on the table, so to speak.
Comment #73
Owen Barton CreditAttribution: Owen Barton commentedComment #74
pwolanin CreditAttribution: pwolanin commented@Owen - see #56. By changing the file name you can often get a 404 for CSS or JS (leaving a very ugly site) when your site is behind a reverse proxy.
Comment #75
yusufg CreditAttribution: yusufg commented@pwolanin. I am a Drupal newbie but isn't it possible to canoncalize (ie, rewrite) all incoming Drupal URL's to remove the versioning from the URL and thus eliminate the 404
I agree with @Owen that query strings should be avoided, Steve Souders of Google and of the main drivers for performance on the front end makes a good case against them for it
http://www.stevesouders.com/blog/2008/08/23/revving-filenames-dont-use-q...
Some other blog posts on doing static asset versioning
http://muffinresearch.co.uk/archives/2008/04/08/automatic-asset-versioni...
http://www.ejeliot.com/blog/125
Comment #76
pwolanin CreditAttribution: pwolanin commented@yusufg - not really, These files are served by the webserver. It would be a serious performance issue to serve them via PHP.
Comment #77
DamienMcKennaHow's about this option - a new variable called "skip_css_js_query_string" which lets you skip the query string addition entirely (defaults to FALSE meaning the query string is added)? A corresponding patch has been supplied for the Zen theme (#772234: Optionally disable using css_js_query_string).
Comment #78
DamienMcKennaComment #79
pwolanin CreditAttribution: pwolanin commented@DamienMcKenna - sounds like a feature request for Drupal 7
Comment #80
DamienMcKenna@pwolanin: I'm going to grok D7 to take a look, at an initial glance the code works differently.. I'll work on it.
Comment #81
pwolanin CreditAttribution: pwolanin commented@yusufg - those posts are a bit old. Varnish does not exhibit that behavior, when tested against Drupal 7:
$ curl -i http://example.com/sites/example.com/files/css/css_8b9b5e778d8fc108d57aece98a78e42b.css?l0xab7
$ curl -i http://example.com/sites/example.com/files/css/css_8b9b5e778d8fc108d57aece98a78e42b.css?l0xab7
Comment #82
pwolanin CreditAttribution: pwolanin commented@DamienMcKenna - can you open a different issue perhaps? Unless you really think this change should not be backported to Drupal 6 (which was the state of this issue)
Comment #83
EvanDonovan CreditAttribution: EvanDonovan commentedI think that opening a new issue for #77 would be good, so that this change could be backported - especially for the sake of those of us using reverse proxy caches.
Comment #84
DamienMcKennaWill do; I thought this patch might at least *help* towards the goal of this task.
Comment #85
DamienMcKennaFYI I've added #772328: Optionally disable using css_js_query_string to handle my patch - d6 for now, d7 later.
Comment #86
EvanDonovan CreditAttribution: EvanDonovan commentedThanks, DamienMcKenna!
Comment #87
Owen Barton CreditAttribution: Owen Barton commented@pwolanin - the concern with using query strings to rev files is not that they break caching in reverse proxies (used for scaling/balancing/ha etc), because as I noted in my original comment, that is not the case. Instead, the concern is that they break the normal "forward" caching proxies that are frequently used by large organisations and ISPs. This means that these users have to make a long distance request to the original site, rather than a fast and (relatively) local one.
I am not sure I understand the issue about getting 404s on CSS/JS. There was a bug in old versions of D6 that it would not clear the page cache when aggregate CSS/JS was cleaned up that could cause this behaviour - but Jacob Singh fixed that issue. Caching reverse proxies (at least when properly configured!) shouldn't have this issue either AFAICS, because they should be caching CSS/JS at least as long as regular pages, if not longer. If this is an issue, then it seems that perhaps an approach could be to use filename caching, but be less aggressive about cleaning it up - e.g. only delete files with a ctime of > 1 week.
Comment #88
mfer CreditAttribution: mfer commented@Owen Barton Squid and other proxies should be able to cache with the query string. At least according to the details at http://drupal.org/node/772328#comment-2851494.
Comment #89
Owen Barton CreditAttribution: Owen Barton commented@mfer - yes, I have already acknowledged (twice) that they can cache content with a query string - my point however, is that until very recently Squid didn't default to do this, and hence the vast majority of ISP/institutional caching squid proxies (and web gateways/filter appliances based on Squid) in the wild have this option disabled. Drupal sysadmins have no control over these (remember, we are not talking about hosted reverse proxies), so unless you are offering to contact all the caching proxy squid users in the whole wide internets and fix their Squid configuration we still have a problem here.
Comment #90
Kiphaas7 CreditAttribution: Kiphaas7 commentedHow about MD5'ing both the names and the filemtime() of files as the aggregated filename, without a querystring?
Comment #91
pwolanin CreditAttribution: pwolanin commented@Kiphass7 - this is the sort of thing we just removed from Drupal 7. The debate is whether it was the correct change to use a query string that changes instead of changing the file name.
After discussing some of the other cases with Owen today about various use cases, I think I conclude that either:
we made the right decision and should proceed to apply it to Drupal 6, or when we clear the cache we need to retain (not delete) the prior aggregated versions for a long period (e.g. 1 mo to 1 year).
Comment #92
Wim LeersRead through the entire issue.
Good change.
But why is this issue marked as 7.x while it's been committed (#61) and now a basic/essential D6 backport is being discussed? Or did I miss something?
Comment #93
pwolanin CreditAttribution: pwolanin commented@Wim - there was some concern about using the query string with forward proxies, so it got moved back to Drupal 7.
It would be nice if we could make the filenames correcspond to the hashed content of the aggregated files, but that would need more substantial re-architecting.
Comment #94
pwolanin CreditAttribution: pwolanin commentedDiscussing this with Narayan and others, we shoudl change what code is doing here due to the possible forward proxy problems highlighted by Owen.
Comment #95
pwolanin CreditAttribution: pwolanin commentedgoing to roll a stacked patch that depends on http://drupal.org/node/723802
Comment #96
pwolanin CreditAttribution: pwolanin commentedNew approach - generates the file name based on the hash of the actual file contents. As above - this is a stacked-on patch, so likely will fail to apply to a clean core.
Comment #97
pwolanin CreditAttribution: pwolanin commentedNew approach - generates the file name based on the hash of the actual file contents. As above - this is a stacked-on patch, so likely will fail to apply to a clean core.
Comment #98
AlexisWilke CreditAttribution: AlexisWilke commentedHi Pwolanin,
I like the idea of the version... and if it causes problems to some intermediate proxies, maybe consider putting it in the filename instead of the query string as in:
sites/default/files/js/drupal-1.2.3.js
and that file would be a copy of the original...
Then, when the original changes, you create a new file such as:
sites/default/files/js/drupal-1.3.2.js
That way, old pages that are still in some caches (but not the corresponding .js) can reference the old file and still work as expected. New pages will start to make use of the new .js. Some CRON tool could then delete "very" old files (3 or even 6 months old... but here you need to know that drupal-1.2.3.js is the old file...)
Hope this helps a bit 8-}
Thank you.
Alexis Wilke
Comment #99
Owen Barton CreditAttribution: Owen Barton commentedI think the hashing based on file contents might be better off at #678292: Add a development mode to allow changes to be picked up even when CSS/JS aggregation is enabled (or not), where there is an existing patch that makes the aggregate filename sensitive to file changes by including the mtime for each file in the file array. With that patch aggregation can be enabled by default without infuriating new themers. There are some performance issues with the mtime, so this patch allows busy sites can disable the mtime stat check and clear caches manually. Users can also disable aggregation for when using firebug etc. I am not clear if aggregating based on file contents would be faster or not - if so we might be able to drop the new option. Either way, that part seems pretty unrelated to cache expiry - although if I am missing something please explain!
The drupal_delete_file_if_stale() stuff looks great though, and I think we should extract that out into a separate patch so that it is easier for people to review (since that wouldn't need all the sha stuff mixed in).
Comment #100
pwolanin CreditAttribution: pwolanin commented@Owen - that issue is trying to solve a different problem. Perhaps using the hash of the content is the right approach there too, but honestly that issue looks like Drupal 8 to me.
Comment #101
sunHeavy consideration given #769226: Optimize JS/CSS aggregation for front-end performance and DX --
No matter what we do, the main issue will remain with the current approaches.
So what do others do?
Just recently, I stumbled over www.wepay.com and happened to look at its source. They are doing more or less exactly what crossed my mind with regard to this issue all over again:
The main issue here is that aggregated files may get lost, but external caches and whatnot may still reference them. But URLs are allowed to be long - very long actually. And given aforementioned conceptual aggregation bug #769226, which will heavily decrease the amount of aggregated files, and, also taking into account that file counts may be decreased further to ensure proper file weights (multiple aggregates)... this leads to the open question:
Is there room for an ImageCache-alike approach?
I.e.:
Regardless of query string, regardless of whether the aggregate does not exist yet, the referenced aggregate file would load.
Comment #102
pwolanin CreditAttribution: pwolanin commented@sun - interesting idea. But requires much more change than this, plus means that in some cases an external cache will still be invalid if Drupal is down.
Comment #103
philbar CreditAttribution: philbar commented@sun interesting suggestion.
Would it be possible to use an encryption to rename the aggregated css to something smaller:
So this...
/sites/default/files/css/jquery.js--jquery.once.js--drupal.js--contextual.js--toolbar.js?v12345
Becomes this using the CRC Hex Representation...
/sites/default/files/css/f5b1da79.js?v12345
Comment #104
AlexisWilke CreditAttribution: AlexisWilke commentedGuys,
That's what the md5() call is for. An md5() is an advanced CRC computation.
Again, putting the version in the file name is probably wiser if you want to keep older versions of it. The fact is that the query string is totally ignored by Drupal. It is there only to make the caches along the way wake up to a new version of the file...
My only worry about running an md5() on the content is that it will take longer as the content can be rather large. On the other hand, unless you are a developer, the content does not change but when you upgrade to a new version. So it should not be a major problem. And doing the md5() on the whole content is much safer than expecting the silly programmers to remember bumping the versions up on each release.
Thank you.
Alexis
Comment #105
philbar CreditAttribution: philbar commentedI'm sorry. I don't think I was clear enough about my suggestion.
I think there should be some concern about filesize optimization.
So instead of having the dozens of characters....
jquery.js--jquery.once.js--drupal.js--contextual.js--toolbar.js
Have a truncated representation. For example, I just ran the CRC encryption on the filename, not the contents.
f5b1da79.js
I think md5() will be too long unless it can be specified as less than 10 characters.
Maybe I'm splitting hairs, but I think for sites with thousands of visitors, a few kb here and there can add up.
/js/f5b1da79.js?v1234
is much better, in my opinion, than...
/sites/default/files/js/jquery.js--jquery.once.js--drupal.js--contextual.js--toolbar.js?v12345
Especially considering it's call on every page...
Comment #106
philbar CreditAttribution: philbar commentedYep, I just double checked and md5() uses 32 characters, which isn't ideal for bandwidth optimization.
http://php.net/manual/en/function.md5.php
Comment #107
pwolanin CreditAttribution: pwolanin commented@all - the patch above is trying to address a defined problem with minimal refactoring. See the linked issue from Owen if you want to delve into deeper architectural changes for Drupal 8.
Comment #108
Owen Barton CreditAttribution: Owen Barton commentedHaving looked at the patch more carefully I now see that it is hashing the file contents only when generating the aggregate, and then storing a "file array hash -> file contents hash" lookup variable that is used to determine the filename on later runs. In my initial (mis)read I thought it was loading and hashing all file contents on every run, which would have been a major performance red flag.
The other thing that was not very clear was how the change in aggregate name generation is related to the issue at hand here. Theoretically it is not - the aged expiry code would work fine without this. However, given the rate at which the "css_js_query_string" currently increments on active sites this is going to lead to a really, really large number of files in this directory. Changing to content based naming, together with #769226: Optimize JS/CSS aggregation for front-end performance and DX will reduce this to a much more sane number. Hence I think it is important that both these changes go together.
So overall, I think this is a great approach. Not only will versioning by filename, rather than querystring improve cacheability but also leads to a dramatic reduction in the length of time a JS/CSS file can be cached by end-users, as the aggregate files no longer need to be redownloaded every time "css_js_query_string" is incremented, even if the actual contents haven't changed at all.
In terms of shortening the aggregate file name - I think that is also a good idea, but please open another issue for this. My sense is probably that we can simply truncate the sha hash pretty dramatically and still avoid collisions. I don't think CRC will add anything over truncating the sha has (I think it's probably worse, for the same length hash). Does anyone care to do the math to determine the value of N characters of sha hash needed to ensure the possibility of a collision on all the aggregate files likely to be generated in the next 100 years is very tiny?
In terms of this patch:
- If submitting a stacked patch, please add an interdiff also, so we can see the actual changes in question. Otherwise it is really tricky to review!
- I think this really needs some comments - it took me a quite a while to figure out what was going on.
- I think a month is certainly a sensible default for expiry - I can't imagine even the most aggressive caches getting more than a month out of sync. I wonder if we should wrap this in a variable_get however, I can imagine some occasions where sites might want to shorten this (e.g. for intranet sites, where there are no intermediate caches), or even lengthen it (e.g. for very large but static sites where pages are mostly served from CDN). The code is only called on cache clear, so I don't think there is any performance consideration.
Comment #109
kkaefer CreditAttribution: kkaefer commentedI found
to be a good compromise. This results in filenames like "6tfzsat59d.css".
Comment #110
pwolanin CreditAttribution: pwolanin commented@Owen - I think this would be an improvement regardless of whether we default aggregation to false. The goal of my patch is that the file name only changes when the contents of the aggregated files change. That's exactly the behavior we want.
Comment #111
Owen Barton CreditAttribution: Owen Barton commented@pwolanin - I meant that the aggregate file name based on file contents and the dated expiry (in other words, this patch) need to go in together. As I said, the dated expiry technically solves the issue in question on it's own, but it is not practical (too many files).
Comment #112
AlexisWilke CreditAttribution: AlexisWilke commented@Owen,
One detail in regard to when files are being aggregated/changed. It is not only when the cache is being cleared. It can happen any time a file is added and/or removed. Especially, some modules will generate more or less .js/.css add depending on the page you're on. Each on of these pages will generate a different aggregated file.
The existing code (without the patch) is 100% the same as what @sun explains in #101 without the versioning. What I mentioned earlier was that by adding the version in the filename, we'd get a perfect naming convention. However, since we need to load all the data to aggregate it, there is no reason to attempt using a version that a programmer may forget to update each time he changes his JavaScript code.
Finally, truncating the sha1() or md5() is fairly useless unless you really have million of users. In that case, you may also want to truncate all the tabs in your themes and all the new lines as you're at it. And any space at the end of the line before the new-line, or at least replace all the \r\n by \n only... You'd probably save several MB with million of hits every day that way... We should have Drupal do that work for us! Add an HTML optimizer!
Thank you.
Alexis
Comment #113
Owen Barton CreditAttribution: Owen Barton commentedYes - this is easy to fix though: #769226: Optimize JS/CSS aggregation for front-end performance and DX
That proposal is great for non-aggregated files (you can use rewrites to strip version numbers and serve unversioned filenames), however it is off-topic as this issue is about aggregated files. It is not easy to generate a simple version numbering scheme for aggregate files, which is why we use a hash.
Well, Drupal does have many millions of (end) users, so I think that proves the point really. Still, I think that is a different issue. Optimizing HTML is actually not unreasonable, although cached pages are gzipped, so it is hard to make an impact without actually removing divs.
Comment #114
pwolanin CreditAttribution: pwolanin commentedre-roll for sha-256 patch plus adding an opaque variable as Owen suggests.
Comment #115
Jerome F CreditAttribution: Jerome F commentedSubscribe.
off topic comment - removed -
Comment #116
AlexisWilke CreditAttribution: AlexisWilke commentedJerome,
There are 3 sites on which I had problems and at some point I finally noticed that the problem was that Drupal could not create the css folder. And instead of not trying to aggregate, it failed the write and put the invalid link in the page header... and it does not report the error either. (At least in D6)
Thank you.
Alexis Wilke
Comment #117
Jerome F CreditAttribution: Jerome F commentedoff topic comment - removed -
performance issue caused by panels + wsod module
http://drupal.org/node/789534
Comment #118
Owen Barton CreditAttribution: Owen Barton commented@All - please open a separate support issue if you are having problems with aggregation in Drupal 6. This patch won't fix your problem, and your comments add noise to the issue and prevent it moving forwards.
@pwolanin - I think the newest patch still needs some kind of comment explaining the high level approach to managing aggregate file names, but other than that it looks good to me.
Comment #119
pwolanin CreditAttribution: pwolanin commentednow with added comments
Comment #120
Owen Barton CreditAttribution: Owen Barton commentedThis reads like the full file contents is hashed on every pageload, which would obviously be a bad idea - I think it would also be worth mentioning something along the lines of:
"Once generated, the aggregate filename is retrieved on subsequent page loads via a lookup table that uses the hash of the file array as the key."
Other than that I think this looks great.
Comment #121
pwolanin CreditAttribution: pwolanin commentedrevised code comments.
Comment #122
jhodgdonFunction header docblocks should start with verbs like "Aggregates" rather than "Aggregate".
I also think the writing could be slightly better if long sentences were broken up, maybe just adding commas. For instance:
Comment #123
pwolanin CreditAttribution: pwolanin commentedrevised doxygen with further input from jhodgdon and Owen Barton in IRC.
Comment #124
Owen Barton CreditAttribution: Owen Barton commentedI think this is ready to go
Comment #125
jhodgdonThis doesn't quite conform to doc standards:
a) There should be a blank line between @param and @return sections
b)
Delete -> Deletes
Same lower down in the JS section.
c)
I think this should say "Aggregates JavaScript files into a cache file in the files directory." maybe?
Comment #126
pwolanin CreditAttribution: pwolanin commentedOk, I think this fixes all the doxygen issues.
Comment #127
jhodgdonOK by me!
Comment #128
Dries CreditAttribution: Dries commentedI had to read that sentence 3 times. It is rather dense/brief -- could we make it a bit easier to read?
Otherwise this patch is RTBC. The documentation improvements are not a requirement, but it would be nice to have.
Comment #129
pwolanin CreditAttribution: pwolanin commented@Dries - that version is the third revision attempting to make it more comprehensible with input from Jennifer and Owen ;-)
However, I'll if I can take another pass at it, or maybe find someone less familiar with the code to make suggestions.
Comment #130
pwolanin CreditAttribution: pwolanin commentedMade those code comments a little longer and more explicit to ideally enable comprehension on the first read.
Comment #131
Dries CreditAttribution: Dries commentedThanks Peter et al. Committed the patch in #130 to CVS HEAD.
Comment #132
pwolanin CreditAttribution: pwolanin commentedshould we try to backport this to Drupal 6?
Comment #133
RobLoachMaybe it could go in JavaScript Aggregator for 6?
Comment #134
pwolanin CreditAttribution: pwolanin commentedTo summarize:
the current D6 code changes the file names of the CSS and JS cache file on every cache clear and also deletes the old files.
A full backport would potentially add a new function and change a couple function signatures in Drupal 6.
A partial backport of just the "delete file if stale" function would probably solve #2, but not #1.
Comment #135
AlexisWilke CreditAttribution: AlexisWilke commentedpwolanin,
I have had problems with #2 where someone will go to my site and not see any CSS at all which makes the page rather... ugly. So fixing that already would be a plus.
#1 is a nice to have but oh well...
I do not know whether I had problems with #3, I would assume that what looks like #2 could also be related to #3.
On my end, I'll continue to use D6 for a while since I have many customers on it and D7 is not ready yet. (and most of the 200+ modules I use are not D7 compatible.)
Thank you.
Alexis Wilke
Comment #136
EvanDonovan CreditAttribution: EvanDonovan commentedI believe a 6.x backport would be very useful, as I had stated above (in the 70's), back when we had an earlier version of the patch.
Comment #137
pwolanin CreditAttribution: pwolanin commentedhere's a first pass at the partial backport.
I scanned DRUPAL-6--1 contrib and don't see any modules or themes using the functions that would be affected by a full backport, but that's not a 100% guarantee
Comment #138
AlexisWilke CreditAttribution: AlexisWilke commentedpwolanin,
I applied this patch, I'll let you know if I see any side effect. As is, it looks good to me.
One thing though, the variable drupal_stale_file_threshold will not get some UI this way. 8-)
Thank you.
Alexis
Comment #139
pwolanin CreditAttribution: pwolanin commented@AlexisWilke - it's not intended to have an UI every variable - some tweaks are for experts.
Comment #140
pcambrasuscribe, does anybody know if this is implemented/going to be implemented in pressflow?
Comment #141
pwolanin CreditAttribution: pwolanin commented@pcambra - well, if we get it fixed here in core, it will get merged into pressflow I assume.
Comment #142
pwolanin CreditAttribution: pwolanin commentedoops - the 7.x patch missed removing the query string. rats :-(
Comment #143
pwolanin CreditAttribution: pwolanin commentedComment #144
Owen Barton CreditAttribution: Owen Barton commentedGiven that was the whole point of revisiting this issue, not sure how we missed this! Looks good though.
Comment #145
AlexisWilke CreditAttribution: AlexisWilke commentedNote that today I checked my logs closely (to check on a spammer) and noticed that each page hit from that user generated a reload of the .js and .css files (the .css was less obvious because it is compressed but it was the same thing.) The query string was changing between each page cache and thus the user browser had to reload each distinct "version." I thought that was an interesting side effect. I'm using boost for long term caching and thus I'm 99% sure the problem also comes from there. Ah! And I've been working on simplemenu that resets the js/css cache every time you hit Save in the settings (because I have a newer version not yet in the CVS that has a dynamic CSS file that needs to work next time you visit a page...)
I would assume that after a few weeks, that problem would go away.
Anyway, all that to say that even when not aggregating it would still be a lot better to send a file with an md5 of the content as the filename, instead of filename?<letter/digit>. In such situation, it would save a lot of traffic.
Thank you.
Alexis Wilke
Comment #146
pwolanin CreditAttribution: pwolanin commented@Alexis Wilke - are you referring to Drupal 6 or Drupal 7? The change already made for Drupal 7 is what you suggest.
Comment #147
AlexisWilke CreditAttribution: AlexisWilke commentedThis is my current D6. 8-)
Yet, I thought that the D7 was doing that for aggregated files, not for the others... But maybe I did not read the patch properly?
Will the non-aggregated files look like this now:
md5(content of blah.js).js
or like before, more like that:
blah.js?A
This is a problem with non-aggregated files and I guess this thread is not exactly in link with those... but the idea pointed out here is a powerful optimization!
Now imagine using this name:
It would not need to change until the content of blah.js (the source) changes. But if we just send out the files as is when not aggregated, we'll still get:
blah.js?A
and later
blah.js?B
etc.
And again, all of those are non-aggregated files.
Comment #148
pwolanin CreditAttribution: pwolanin commented@AlexisWilke - please check the D7 patches in detail. D7 does now base the file name on the hash of the contents
hashing the contents on every page load is a significant performance hit, so not something we will do.
(also, resetting status which I did not mean to change)
Comment #149
webchickThat extra query string is very useful as a means to do server-side clearing of clients' caches in the case of updated CSS.
Can someone point me to a good summary in this issue of why removing this feature is desirable?
Comment #150
webchickOk, spoke with pwolanin in #drupal-contribute.
He summarized this issue as it replacing the existing code, which appends a querystring in the case of non-aggregated CSS files, and a hashed name in the case of aggregated ones, and re-generates both in case of a cache clear... with a smarter algorithm that instead only re-generates in case of actual changes. So the behaviour I want to preserve is there, just smarter.
So there's no point in keeping the old behaviour around, therefore this patch is just a small clean-up.
Committed to HEAD. Thanks!
Back to 6.x and needs review for #137.
Comment #151
AlexisWilke CreditAttribution: AlexisWilke commented@pwolanin - nevermind 8-) Actually you have a version string that comes from the JavaScript being sent to you. In D6, you are using the same query string addition which changes all the time for ALL the scripts.
Comment #152
catchLooks like a cross post.
Comment #153
sunAFAIK, we do not have a single system variable that is prefixed with "drupal_".
Since these aggregates are globally dependent on the corresponding "preprocess_*" variables, they should be in the same namespace, i.e.:
preprocess_js_files
preprocess_css_files
preprocess_stale_treshold
79 critical left. Go review some!
Comment #154
pwolanin CreditAttribution: pwolanin commentedhmm, didn't even notice this comment until now - seems like a valid critique.
Comment #155
locomo CreditAttribution: locomo commentedsubscribe
Comment #156
bleen CreditAttribution: bleen commentedI dont disagree with Sun's point in #153, but its worth pointing out that these variables exist too:
drupal_http_request_fails
drupal_test_email_collector
drupal_private_key
drupal_stale_file_threshold
This patch makes these changes:
drupal_js_cache_files -> preprocess_js_cache_files
drupal_js_version_query_string -> preprocess_js_version_query_string
drupal_css_cache_files -> preprocess_css_cache_files
Comment #157
Owen Barton CreditAttribution: Owen Barton commentedI don't think this is an accurate name - the query string used only for non-preprocessed (i.e. individually served) JavaScript files.
I think "js_version_query_string" is both accurate and sufficiently detailed.
Comment #158
pwolanin CreditAttribution: pwolanin commentedHere's the variable name as Owen suggests plus reducing the default saved time to 3 days from 30, since that turns out to be silly in practice.
Comment #159
Owen Barton CreditAttribution: Owen Barton commentedNow that #769226: Optimize JS/CSS aggregation for front-end performance and DX is committed (hooray!) the number of aggregated files generated should be much more sane and minimal (really just a handful), so I think a higher number of days might still be reasonable. Even if a site has weekly releases that include CSS/JS changes I think you shouldn't end up with an overwhelming number of files.
Other than that I think this is RTBC.
Comment #160
pwolanin CreditAttribution: pwolanin commentedWell, the # of days should correspond to the maximum time any forward or reverse proxy might reasonably hold the page. for that reason 3 seems reasonble (I could be persuaded to 5 or whatever, just not multiple weeks).
In development where one clears the cache often, the number of files can get quite large.
Comment #161
bleen CreditAttribution: bleen commentedmy 2¢'s = I think 3 days is plenty
Comment #162
mikeytown2 CreditAttribution: mikeytown2 commentedI would opt for 2 weeks, it's what we use in other places; like htaccess.
http://drupalcode.org/viewvc/drupal/drupal/.htaccess?view=markup
Comment #163
bryancasler CreditAttribution: bryancasler commentedPatch in #67 does not work for me.
Comment #164
bryancasler CreditAttribution: bryancasler commented#137 I am trying this patch but I am confused if it is intended to solve the problem I came here for.
On my site I aggregate JS and CSS files. I'm also using boost.
The problem I am trying to fix is that updates to my sites CSS or JS breaks all the pages cached by boost because the JS and CSS file names change. Is the patch in #137 intended to fix this? and if so how long until the CSS and JS are re-aggregated after changes are made?
Example output
href="/sites/default/files/css/css_d9fe3bfc3489744012374479bc8e4146.tidy.css"
src="/sites/default/files/js/js_e9acbc38a52b8c288bb69ecd05a148f8.jsmin.js"
Comment #165
AlexisWilke CreditAttribution: AlexisWilke commentedanimelion,
No, that's another problem I ran into. Boost does not take the default Performance flags in account.
Under "Boost cacheability settings" make sure that the .css and .js files are being cached.
I reported that problem to the Boost people and the answer was that they did not have to do anything about it...
Thank you.
Alexis
Comment #166
mikeytown2 CreditAttribution: mikeytown2 commentedTake a look at the "Ignore cache flushing: " setting under the "Boost cache expiration/flush settings" section. In short if you have css/js caching enabled you can ignore hook_flush_caches and not have to wipe out the html cache every time cron runs. Blog post explaining the problem http://www.metaltoad.com/blog/how-drupals-cron-killing-you-your-sleep-si...
Comment #167
bryancasler CreditAttribution: bryancasler commentedThanks for everyones help, I've made site adjustments accordingly. I still have one question though. If I did make a change to the CSS is there a way I can view those changes on the homepage without flushing the site's css and damning all the cached pages?
Comment #168
AlexisWilke CreditAttribution: AlexisWilke commentedanimelion,
Yes, I do that all the time. Load your home page and look at the current .css filename, then delete the corresponding files (in boost perm cache and files/css/...) Finally, reload your home page.
It's a bit of manual work, but it has been working for me and better than deleting everything or turning off the caching while testing...
Thank you.
Alexis Wilke
Comment #169
bryancasler CreditAttribution: bryancasler commentedBrilliant! Thanks for the work around Alex. I will give it a shot later tonight.
Comment #170
EvanDonovan CreditAttribution: EvanDonovan commentedIs this already in Pressflow 6.19? Also, what's necessary on 7.x before this can be backported to standard 6.x? This will help many users greatly, both with regard to Boost and Varnish caching.
The variable name and time changes seem pretty simple from looking at the code. I didn't test though.
Comment #171
EvanDonovan CreditAttribution: EvanDonovan commented#158: 721400-variable-names-158.patch queued for re-testing.
Comment #172
pwolanin CreditAttribution: pwolanin commentedIn #159 Owen says this is RTBC except for number of days, but note that this number of day is only relevant to how long a browser or proxy holds a cached page without refreshing it from Drupal. A typical max-age header is for 5 or 10 minutes, so I think 3 days is more than sufficient.
Comment #173
pwolanin CreditAttribution: pwolanin commented#158: 721400-variable-names-158.patch queued for re-testing.
Comment #174
hass CreditAttribution: hass commentedNow core does not delete my JS caches anymore. Is this really intentionally?
drupal_clear_js_cache()
callsdrupal_delete_file_if_stale
and this function only deletes files older than 3 days (currently 30 days). I think I understand your idea about keeping the files for some days, but if I do not press the "Cache clear" button in performance settings - the files are never ever deleted...At least DX is broken as I still expect a
drupal_clear_js_cache()
will purge my outdated JS files from disk...Comment #175
Owen Barton CreditAttribution: Owen Barton commentedThe marker to the "current" set of files is outdated, so the old files are not referenced in generated pages any more. The old files are still available in case proxies/browsers upstream manage to cache old HTML requests, but not the CSS/JS files that they reference (which currently leads to broken pages).
Comment #176
hass CreditAttribution: hass commentedYes, but now with the latest code (and this patch, too) my disk "fills up" with stale files and I may have 1 year old JS files on my disk if I do not press the clear cache button for one year...
Comment #177
AlexisWilke CreditAttribution: AlexisWilke commented@hass,
I have that patch installed and my /js/ folder has 30-day old files, no more. The clearing function is being called in other circumstances (when a new file is created.) What could happen under MS-Windows is that the file creation date is not correctly returned... I thought I read something about that somewhere.
Now what does happen is a large number of files because of some conditional CSS load from different modules. So I get tons of CSS files. My main site has 1215 CSS files, but again, the oldest is 31 days, no more. With 3 days, it will reduce that count quite a bit which is great (probably ~40 files.)
Thank you.
Alexis
Comment #178
hass CreditAttribution: hass commentedI was not able to find a place where the clearing is called on normal production machines. It's called id you run an update hook , but there is no cron or so on... It looks like luck if your disk get's cleared...
Comment #179
AlexisWilke CreditAttribution: AlexisWilke commentedI guess you are correct. I suppose I don't see those lists grow any more than that because I work so much on all my websites. I could be called from the function that creates new JSes and we'd want to add a variable so the check only happens once a day to avoid wasting to much time.
We'd need something similar for CSS files.
Comment #180
hass CreditAttribution: hass commentedMaybe... I cannot remember exactly where and when drupal_build_js_cache() is called, but I guess only if a JS file changes... so - no cleanup :-)
Comment #181
EvanDonovan CreditAttribution: EvanDonovan commented@hass: So you're saying the old files will never get deleted if the site runs in production (i.e., no module changes, etc., so no JS changes), since they stop being referenced, but aren't actually deleted?
Unfortunately, and I say this as a user of the Varnish reverse proxy, this doesn't sound good. Possibly would we need another setting added, something like "file lifetime", which could be set on the performance page also, and then people could set it to be longer than things would persist in the cache of their reverse proxy? The default for this could be something like 30 days, which should be plenty of time for reverse proxies to no longer be referencing the old files.
(Alternatively, if that was bad UX from the perspective of having too many settings, it could be a variable that was set, but which wasn't visible in the UI. Then users of reverse proxies would have to change it in settings.php if they were unhappy with the default.)
Comment #182
hass CreditAttribution: hass commentedCode wise - currently - yes. Only search core code for drupal_clear_js_cache() calls and you will not find any calls to this function - except you interactivly press the "clear cache button" or you "submit" the modules page form. This are all things that never happens in production...
Comment #183
EvanDonovan CreditAttribution: EvanDonovan commented@hass: Currently on my site, I actually press this in production once in a while, but that's because we make changes more often than changes are probably made to the average production website. So do you think something like my proposal in #181 is necessary?
I am hesitant to set this issue back to "needs work" though, because I really would like to see a backport to D6, and that can't happen until this followup is committed to D7.
Comment #184
catchEven if they never get deleted without a cache clear, and most sites change the contents of these files at some point, then you're still only going to have a files a maximum of 30 days old at the time the last cache clear ran in that directory.
Comment #185
AlexisWilke CreditAttribution: AlexisWilke commented@catch, no, that's the point. Like EvanDonovan, I'm changing things on my website all the time and thus it's kept in check, but if you are to keep going without updating anything (i.e. modules or run a performance + clear cache) then the number of files will build up.
Doing like in #179 would clear old files whenever you create a new one. That would maintain the number of files, whether it's 3 or 30 days (number of days can be changed in the settings.php file.) It won't matter if the function isn't called for 60 days if no new file is created.
I propose to add this, to be clear:
Comment #186
EvanDonovan CreditAttribution: EvanDonovan commented@AlexisWilke: Thanks for the clarification. Your proposed code makes sense to me. We'll have to see what the original patch authors (pwolanin, et al.) think...
Comment #187
EvanDonovan CreditAttribution: EvanDonovan commentedBump - I need this on a D6 site. Is there still a chance of this getting in for D7?
Comment #188
pwolanin CreditAttribution: pwolanin commented#158: 721400-variable-names-158.patch queued for re-testing.
Comment #189
tom_o_t CreditAttribution: tom_o_t commentedNo changes made, bot is stuck, re-uploading patch for retesting.
Comment #190
bleen CreditAttribution: bleen commentedhere testbot
Comment #191
bleen CreditAttribution: bleen commentedi think it needs a .patch extension ... same patch from #189
Comment #192
hass CreditAttribution: hass commentedPatch in #605318: Add some garbage collection to the update manager may be of interrest here, too.
Comment #193
momper CreditAttribution: momper commentedsubscribe
Comment #194
drupalninja99 CreditAttribution: drupalninja99 commenteddo we know if this patch works (#67 for drupal 6)? i am so fed up with aggregated files, eventually something happens that causes the files to get renamed and varnish or some other cache is left referring to aggregated files that dont exist bc the names change.
I would use this patch on high performance sites if i knew it worked.
Comment #195
drupalninja99 CreditAttribution: drupalninja99 commentedI created a patch for pressflow but I dont know what the paths need to be. It always stores my local file paths in the patch so I tried to remove those. I think for my acquia pressflow sites, if this works I'm going to use it every time just bc sooner or later aggregated files are going to be out of sync with some cache, be it varnish or page cache and this gives it a fighting chance.
Comment #196
jcisio CreditAttribution: jcisio commentedsubscribe
Comment #197
mikeytown2 CreditAttribution: mikeytown2 commentedCame up with a module. It's my take on how aggregation should work. It is a D6 module.
http://drupal.org/project/advagg
Still needs some polishing but it is fully functional and feedback is greatly appreciated.
Comment #198
EvanDonovan CreditAttribution: EvanDonovan commented@mikeytown2: That is great news. Possibly this will allow us to fix a long-standing cache issue on our (Varnish-using) 6.x site.
Comment #199
ChrisLaFrancis CreditAttribution: ChrisLaFrancis commentedSubscribing.
Comment #200
drupalninja99 CreditAttribution: drupalninja99 commentedya i like mikeytowns idea once that is stable. in the meant time one solution for your theme is to take your theme files out of the .info file and embed them in the page.tpl or have some module do that automatically (the how is the trick there havent figured that out). If you only say half a dozen stylesheets its not that big of a deal, its more of a drawback if you have a bunch in your theme.
i did this for a site and i think i will keep doing it bc the alternative is just unacceptable for a high performance site. there are so many actions that inadvertently delete those aggregated files so if you move your theme css out you mitigate the risk and you dont have to completely abandon aggregation altogether.
Comment #202
mikeytown2 CreditAttribution: mikeytown2 commented@shomam
Working on something much better in D6 at the moment. It's at RC3. Give it a shot and let me know how it works.
http://drupal.org/project/advagg
Comment #203
EvanDonovan CreditAttribution: EvanDonovan commentedIf there is still work to be done in 7.x, shouldn't this patch be bumped up to 8.x and then backported to 7.x once it gets in there?
Comment #204
dstolSub
Comment #205
Agileware CreditAttribution: Agileware commentedSubscribing
Comment #206
pwolanin CreditAttribution: pwolanin commentedneeds a backport to D6 so we retain the aggregated CSS/JS files for some period rather than immediately deleting them. My patch at #67 is really no good since it's an approach we later discarded.
Comment #207
hejazee CreditAttribution: hejazee commented#72: aggregate_reverse_proxy_0.patch queued for re-testing.
Comment #208
JohnAlbin#191: 721400-variable-names-158-retest.patch queued for re-testing.
I can already tell 191 needs a re-roll so it applies to the files in the core/ directory.
Comment #210
dstolHere's a quick re-roll based on #191 for 8.x.
Comment #211
mcjim CreditAttribution: mcjim commentedRe-roll to match HEAD.
Comment #213
mcjim CreditAttribution: mcjim commentedUpdated core/modules/system/lib/Drupal/system/Tests/Common/JavaScriptTest.php with new variable names.
Comment #214
nod_Not sure how that related to the first post. Haven't read everything but it probably needs a new summary.
Comment #215
mcjim CreditAttribution: mcjim commentedUpdated summary.
The original issue has already been sorted, and the current patch merely changes variable names for consistency with core and sets the default stale file threshold to three days.
Comment #216
bleen CreditAttribution: bleen commentedThe patch in #213 looks good
Comment #217
aspilicious CreditAttribution: aspilicious commentedWhy not using the config system now?
Comment #218
webchickIf we're renaming variables, then we need an update path to do this, or at the very least to drop the old variables so they don't sit around crufting up the database. I don't understand why we're renaming variables, though, nor why variable renaming is holding up a 6.x bug fix. I would recommend moving that out into its own issue as a follow-up.
Comment #219
sunThanks @webchick for holding this off :)
These variables will be "renamed" for D8 anyway. In fact, the concrete variables we're talking about here are not even variables, but "state data" instead, for which we'll introduce a completely separate state storage; see #1175054: Add a storage (API) for persistent non-configuration state
I'm not sure whether it is relevant for this issue in any way, but the only functional change in this patch seems to be this:
Somehow the latest patch in here heavily diverged from the original patch that was RTBC in #143...
The above quoted change only affects the GC routine, so it's not really clear what kind of Performance bug is being fixed here.
Comment #220
Wim LeersRead through this entire issue.
#43:
Done now: #352951: Make JS & CSS Preprocessing Pluggable.
Backport to D6: that hardly seems relevant anymore, after years of zero activity on the backport, and D8 around the corner. Removing the tag.
Current patch in #210/#213: has been silent for many months, is a tiny change, and it's repurposing this issue for the fifth or so time. It's time this issue got closed.
Comment #221
webchickRestoring title.
Comment #221.0
webchickUpdated issue summary.
Comment #222
candelas CreditAttribution: candelas as a volunteer commentedAny news? I keep with the problem. Thanks for your knowledge and time :)