Since we keep missing each other on irc I thought an issue might work instead :)

These projects are quite different in terms of details, but at a broad level we're both keen to get rid of file stats from rendering css/js aggregates and in general improving aggregation from contrib.

My plan with agrcache is to get as much of the work back into core for Drupal 8 and hopefully Drupal 7, if that happens it might even get retired, if not it'll need to be maintained until that happens.

Overall I'd like css/js aggregation to look like this:

1. Use the agrcache method of having page callbacks to generate css/js aggregates. This saves up to one second on requests that actually have to build the aggregates, and even though the css/js is served from PHP when requested, there's still a measurable ~25% improvement in chrome developer toolbar net tab in terms of the full page render. And then removing the file stats on every request.

2. Make it as easy as possible for modules like core library (and there's some Drupal 5/6 modules that had different approaches like configurable bundles) to change the aggregation mechanism. this might have to be case by case via contrib, but it would be interesting to make this more pluggable in Drupal 8. I don't have any plans to introduce this kind of thing to agrcache at the moment.

3. Have a central system for things like js minification, but also other tricks like encoding images directly in css. Somewhere on an issue (which I can't find now), it was discussed having a system similar to the text format / filter approach - so:

- have a central module that provides method for others to processing the js/aggregates when being built.
- have an API/UI to define how the different processors get chained

This would allow us to have a separate modules providing jsmin, or base64 encoding of images, and then people can install what they want, but the actual altering and processing workflow could be handled by one module centrally.

When starting agrcache I didn't have any intentions regarding #3, but that's something I'd consider working on.

Just ideas for now but wanted to get the discussion started.

Comments

pounard’s picture

In my opinion, aggregation should remain the more static and simple possible. When I look the actual core, the full mecanism is crap, way too dynamic.

My actual effort if purely optimization and does not provide any kind of API, but you are going this way and it seems good.

Totally agree about CSS/JS aggregation/minification pluggable backend and more API oriented. It should also be easily configurable via settings.php for future sysadmins.

The good thing in my actual implementation is the I/O reduction and the aggressive group merge for CSS and JS aggregation merge. I also provide an IU that can let the site integrator manually choose exactly which file should and should not be aggregated (and I think this is a thing that core should allow).

One last thing, in a future good API, core should force modules to define their CSS and JS files within a hook (such as the library handling) which would allow site admin to do advanced CSS/JS profiles on a per page/per theme/per site basis.

EDIT: If the core could control the full CSS and JS lib list, it wouldn't ever ever again have to do any I/O at runtime.
Re-EDIT: If modules are able to statically declare all their files, an inclusion policy could be configured by integrators, selecting which file(s) should be *always* aggregated wether or not they are being used. Then all others (dynamic includes) should only be included as-is without any aggregation/minification (this is a runtime job but really costs a lot in term of performances).

File existence should never be checked at runtime, modules developers should be aware that a non existing file or a wrong given path IS A BUG and the core graceful downgrade IS NOT A FEATURE but is an ugly failsafe mecanism due to non stricly developped modules.

catch’s picture

So at the moment there's three things agrcache does:

1. Registers menu callbacks to serve css/js if the file doesn't exist (same as imagecache), this means we never have to do a runtime file_exists() - that only happens in the files/css/% and files/js/% callbacks, and they only get hit on cache misses.
2. Slightly simplifies the css aggregation to avoid having to process them before building the hash, saved quite a bit of time in testing.
3. Instead of calling variable_set() for every aggregate on the page like core, calls it once for all css aggregates and once for js aggregates. I'm pretty sure some of the problems that sites have with css/js aggregation stampedes are actually variable_set()/variable_initialize() issues.

If I read the description of core_library correctly, you're able to skip the i/o, but only once the admin changes a setting. I think core_library could use the agrcache approach for generating the bundles while it's learning, so the i/o would always be saved - then you've still got the different aggregation algorithm in there.

settings.php definitely, or possibly exportables support if we need settings and weight for each processor and settings.php started getting unweildy.

Also while I remember, I'm collecting core i/o issues at http://drupal.org/project/issues/search/drupal?text=&assigned=&submitted...

Short list at the moment but only started that tag a short while ago.

catch’s picture

Also:

File existence should never be checked at runtime, modules developers should be aware that a non existing file or a wrong given path IS A BUG and the core graceful downgrade IS NOT A FEATURE but is an ugly failsafe mecanism due to non stricly developped modules.

Fully agree with this. Core is not forgiving about putting crap into function arguments, but it's very forgiving about stuff like this. Not enough people profiling or doing straces :(

pounard’s picture

We totally agree then :)

catch’s picture

Title: Co-ordinating with agrcache » CSS and Javascript aggregation improvement - meta issue
Project: Core Library » Drupal core
Version: 7.x-1.0-beta6 » 8.x-dev
Component: Code » base system

Moving this to core, since we're more or less discussing core patches here, even if they're done in agrcache first or backported there eventually.

pounard’s picture

As I told quickly on IRC this issue open another issue which is the CDN integration. As you use the same mecanism (menu callback) to get the static CSS and JS files, this is the issue with the image handling mecanism and this mecanism could even be generalized (it would gracefully go in the file stream handling way of doing things).

EDIT: CSS and JS could declare schemes such as css:// and js:// and then benefit from the stream wrapper interfaces and still be using their custom implemention, therefore reducing greatly the full codebase.

The menu callback for accessing file should be generalized to all files, files would then be statically served if exists, or would call a stream wrapper interface method (may be such as generateFile() or such), therefore call the scheme specific implementation behind.

mfer’s picture

When it comes to #1, how are you taking the hash and deriving the files that should be included in the response? The idea of doing the JS/CSS in this form is something a number of people have been talking about for some time. It's nice to see a place to start discussing that idea.

Pluggable JS was started over on #352951: Make JS & CSS Preprocessing Pluggable. It stopped waiting for a common plugin mechanism that never ended up materializing in D7. The same concept could apply to CSS.

catch’s picture

@mfer:

I'll answer at #1014086: Stampedes and cold cache performance issues with css/js aggregation which is the core issue for #1.

Thanks for cross-linking the pluggable issue, it was that which got me thinking about #3 more.

mfer’s picture

I was thinking a little further on #1. If we take a look at what espn.go.com does we see something like:

<link rel="stylesheet" charset="utf-8" media="screen" href="http://a.espncdn.com/combiner/c?v=201010121210&css=global_reset.r1.css,base.r228.css,modules.r424.css,global_header.r38.css,modules/global_nav.r47.css,/espn/espn/styles/frontpage_scoreboard_10,modules/insider_enhanced.200910131831.css,sn_icon_sprite.200907150955.css,mem2010/mem.r5.css,mem2010/mem_espn360.r2.css,universal_overlay/universal_overlay.css,universal_overlay/media_overlay.css,universal_overlay/video_overlay.css,universal_overlay/photo_overlay.css,universal_overlay/dyk_overlay.css,warvertical12a.css,twin.css" /> 

...

<script src="http://a.espncdn.com/combiner/c?v=201012011221&js=jquery-1.4.2.1.js,plugins/json2.r3.js,plugins/teacrypt.js,plugins/jquery.metadata.js,plugins/jquery.bgiframe.js,plugins/jquery.easing.1.3.js,plugins/jquery.hoverIntent.js,plugins/jquery.jcarousel.js,plugins/jquery.tinysort.r3.js,plugins/jquery.vticker.1.3.1.js,plugins/jquery.pubsub.r5.js,ui/1.8.2/jquery.ui.core.js,ui/1.8.2/jquery.ui.widget.js,ui/1.8.2/jquery.ui.tabs.js,plugins/ba-debug-0.4.js,espn.l10n.r8.js,swfobject/2.2/swfobject.js,flashObjWrapper.r7.js,plugins/jquery.colorbox.1.3.14.js,plugins/jquery.ba-postmessage.js,espn.core.duo.r50.js,espn.mem.r15.js,stub.search.r3.js,espn.nav.mega.r24.js,espn.storage.r6.js,espn.p13n.r9.js,espn.video.r33a.js,registration/staticLogin.r10-14.js,espn.universal.overlay.r1.1.js,espn.insider.r5.js,espn.espn360.stub.r9.js,espn.myHeadlines.stub.r12.js,espn.myfaves.stub.r3.js,espn.scoreboard.r4.js,%2Fforesee_v3%2Fforesee-alive.js&debug=false"></script> 

We could do something like:

script src="http://example.com/files/js/js_hash12342341234234?files=misc/jquery.js,misc/jquery.once.js,misc/drupal.js"></script> 

The hash would be built like it is now and the combiner could know the files to use. This could be easily cached by the browser and we could test to make sure the files generate the right hash for security. I even like that it documents what files are included in the hash as it can be a pain to discover when you do need to know.

pounard’s picture

I'm against showing the real file names in the URL. The combiner shouldn't get its parameters from a GET request it's unsafe. It means anyone could DDoS your site by doing this request and putting any parameter. It could even lead to security issues where people could ask for some other arbitrary files to be aggregated in the final files.
It should remain a generated hash that exists in a variable somewhere server side.

mfer’s picture

@pounard You can do it in a safe and secure manner.

catch’s picture

Wim Leers’s picture

I agree that avoiding file stats at all costs is important.

Without being involved in the latest initiatives regarding front-end performance, I apologize in advance for not knowing about current patterns being applied. But, I do have the following question:

Far Future Expires headers are important to maximize client-side caching of files. Far Future Expires headers require unique file URLs however. How to map a file URL to a unique file URL in an efficient manner? (And: file URLs should change whenever their contents change.)

I know this does not directly affect core (yet), but it's an important best-practice pattern to be agreed upon and then documented.

CDN module

I'm currently working on a patch for the CDN module that adds this: #974350: Far Future setting for Origin Pull mode. However, it focuses only on adding Far Future Expires functionality, it does not yet avoid file stats. Essentially, it does two things:
1) Detect the last modification time of a file and use this to generate a unique file URL:

+    // Alter the file path when using Origin Pull mode and using that mode's
+    // Far Future setting.
+    if ($mode == CDN_MODE_BASIC && $farfuture) {
+      $path_before_farfuture = $path;
+      $path = 'cdn/farfuture/' . filemtime($path) . '/' . $path;
+    }

2) Then there is a menu callback defined at cdn/farfuture (meaning that any additional arguments will be passed to the menu callback function). This function then exhibits all the most optimal behavior: optimized caching headers for file URLs, gzip support, range request support (see cdn_basic_farfuture_download() in http://drupal.org/files/issues/974350-12.patch) and correct replies for If-Modified-Since requests.
I know, it's silly to implement all of this in PHP instead of at the web server level. But that is the only realistic solution to guarantee this to be working: cheaper hosting accounts don't allow for all required functionality to be defined through .htaccess files anyway. In the future, I can still provide a .htaccess file that expert users can enable (yes, the goal is to make the CDN module usable for everybody; and yes, CDN's are affordable already).
The file_exists() calls in this function don't matter much, since they'll only be called very rarely, and only by the CDN.

Summary

So, in summary:
1) I have a working solution for unique file URLs, to allow CDNs to be used with Far Future Expires headers
2) this solution does not work without CDNs for large sites: the load would grow too fast due to the file_exists() calls
3) for very large sites, it still doesn't work quite well because of the filemtime() call

It is point 3 in the summary that is problematic and for which I'd like to see a best-practice pattern emerge.

Thoughts?

mikeytown2’s picture

if your not opposed to 2 database tables & one cache table you could use what I got in here.
http://drupal.org/project/advagg
does imagecache generation.
uses zero stat calls when serving a page; even with imagecache style generation off.
hooks for implementing things like jquery cdn, csstidy, jsminplus.
gzip (but this is in D7 now).
uses locking to prevent multiple threads from doing the same thing.

Looking advagg's issue queue, there are still a lot of issues to workout, but the core part of this is working correctly.

pounard’s picture

@#14 Thanks for sharing.

catch’s picture

Issue tags: +Performance

As well as the pluggable js issue, there is #494498: Add CSS parsing capability to Drupal and #352951: Make JS & CSS Preprocessing Pluggable around CSS aggregation.

/here's how I'd break down the issues:

- removing file_exists() checks during page rendering and speeding up aggregate cache misses - this makes sense on any site and there aren't many trade-offs to worry about. #1014086: Stampedes and cold cache performance issues with css/js aggregation. Doing it as a stream wrapper sounds good as well for D8.

- minification: our js aggregation is literally .= with nothing else. Our CSS preprocessing has lots of issues. JS preprocessing afaik only comes down to the various different kinds of minification (which since there's no true winner at the moment we should make pluggable). CSS preprocessing is a bit more tricky - there's de-duplication, inline encoding of images, a few different things that are done that aren't mutually exclusive or necessarily desirable on all sites - for that something more like text formats and filters might be better (at least - allow more than one operation to run on a file, and allow these to be run sequentially rather than just swapped out - more like check_markup()/text formats/filters although it should be more simple than that).

- grouping logic. D7 introduced radically different logic to D6, you can swap it out (bundlecache and core_library both change the logic) but there's a lot of boilerplate code required to do this, these decisions can be quite site-specific so again it'd be good to have a system where those implementations can be swapped in or out.

- expiration of aggregates. D7 core is better than D6 for this, I need to look at the code here. IMO this is like the actual aggregate generation mechanism - we ought to be able to come up with something that works for every site without major side-effects and site admins shouldn't have to worry about it.

- script loaders. LABjs is the most popular, but I've seen at least two more mentioned as well. One of these may win out, but until it does this is another thing it'd be good to be able to support changing around as transparently as possible.

- gzipping. Apart from what are relatively minor issues at #1040534: Rewrite rules for gzipped CSS and JavaScript aggregates cause lots of lstats for files that will never exist this is fine I think.

So file generation, expiration, and gzipping, I think we mainly need settle on one good way to do this in core, then where possible backport it to previous versions.

Grouping logic and minification I don't think core will ever be able to find something that works for every site, so these should be pluggable - but IMO these are two different tasks. I should be able to swap minification logic out separately from the grouping logic, and contrib modules should be able to tackle different approaches to these without having to re-implement the other one. Would be good to support this properly in D8. For D7 it'd be good to have a module that just allows other modules to take over these things more easily without necessarily providing its own implementation, or possibly backport new hooks if we could get away with that.

RobLoach’s picture

Hitting the good ol' non-existent subscribe button.

pounard’s picture

@#16 Agree about a lot of things, especially with having a more flexible grouping logic.

Wim Leers’s picture

Issue tags: +WPO

I agree with everything outlined in #16. The only problem is: time. I won't have time until this summer at least (i.e. after this semester, after my master thesis). If we get this right, Drupal will lead in the WPO realm.

Also see http://drupal.org/project/wpo.

cweagans’s picture

Subscribe.

Jeremy’s picture

Subscribing.

nod_’s picture

From the summary 2 and 3 is what #1751602: Use Assetic to package JS and CSS files is aimed at solving.

catch’s picture

Assetic doesn't help #2, that's really about changing the bundling logic.

nod_’s picture

Assetic can help with bundling, http://symfony.com/doc/2.0/cookbook/assetic/asset_management.html#combin... and since Assetic is pluggable I'm pretty sure we can work something out. For example I just ran into this: https://github.com/smoya/AssetManagementBundle, I wouldn't rule it out just yet.

catch’s picture

The Assetic code isn't really bundling, it's just concatenation once you already have a bundle isn't it?

The second link does look more like what we're dealing with.

pounard’s picture

If you preprocess stuff, you need to dump the files somewhere, which partially fixes #2 if you make the preprocess step mandatory. Concatenation is enough preprocessing for that.

EDIT: Yes the second link adds logic we would need on our side in order to achieve it fully, I tested the equivalent for ZF2 a few monthes ago.

Wim Leers’s picture

#352951: Make JS & CSS Preprocessing Pluggable landed today.


Having reread this issue, #16 captures the gist of what needs to happen. Thanks to #352951, we can now:

  • plug in JS minification (contrib can do this, core can add it without changing APIs): override asset.js.optimizer
  • plug in different grouping logic: override asset.(css|js).collection_grouper
  • plug in different aggregate expiration logic: override asset.(css|js).collection_optimizer
  • plug in a script loader : override asset.js.collection_renderer

So, it is now a matter of deciding what we still want to do for Drupal 8 core. I think removing file_exists() calls is definitely a good idea. It can be achieved by fixing our current CssCollectionOptimizer and JsCollectionOptimizer: it won't break D8 APIs, so we can still do it. Shall we get this done over at #1014086: Stampedes and cold cache performance issues with css/js aggregation?

What else should we do for Drupal 8?

P.S.: inlining of background images in CSS files is now possible using hook_file_url_alter() thanks to #1961340: CSS aggregation breaks file URL rewriting because it abuses it.

Wim Leers’s picture

Title: CSS and Javascript aggregation improvement - meta issue » [meta] CSS and Javascript aggregation improvement

.

Version: 8.0.x-dev » 8.1.x-dev

Drupal 8.0.6 was released on April 6 and is the final bugfix release for the Drupal 8.0.x series. Drupal 8.0.x will not receive any further development aside from security fixes. Drupal 8.1.0-rc1 is now available and sites should prepare to update to 8.1.0.

Bug reports should be targeted against the 8.1.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.2.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.1.x-dev » 8.2.x-dev

Drupal 8.1.9 was released on September 7 and is the final bugfix release for the Drupal 8.1.x series. Drupal 8.1.x will not receive any further development aside from security fixes. Drupal 8.2.0-rc1 is now available and sites should prepare to upgrade to 8.2.0.

Bug reports should be targeted against the 8.2.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.2.x-dev » 8.3.x-dev

Drupal 8.2.6 was released on February 1, 2017 and is the final full bugfix release for the Drupal 8.2.x series. Drupal 8.2.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.3.0 on April 5, 2017. (Drupal 8.3.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.3.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.3.x-dev » 8.4.x-dev

Drupal 8.3.6 was released on August 2, 2017 and is the final full bugfix release for the Drupal 8.3.x series. Drupal 8.3.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.4.0 on October 4, 2017. (Drupal 8.4.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.4.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.4.x-dev » 8.5.x-dev

Drupal 8.4.4 was released on January 3, 2018 and is the final full bugfix release for the Drupal 8.4.x series. Drupal 8.4.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.5.0 on March 7, 2018. (Drupal 8.5.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.5.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.6.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.5.x-dev » 8.6.x-dev

Drupal 8.5.6 was released on August 1, 2018 and is the final bugfix release for the Drupal 8.5.x series. Drupal 8.5.x will not receive any further development aside from security fixes. Sites should prepare to update to 8.6.0 on September 5, 2018. (Drupal 8.6.0-rc1 is available for testing.)

Bug reports should be targeted against the 8.6.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.6.x-dev » 8.8.x-dev

Drupal 8.6.x will not receive any further development aside from security fixes. Bug reports should be targeted against the 8.8.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.9.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 8.8.x-dev » 8.9.x-dev

Drupal 8.8.7 was released on June 3, 2020 and is the final full bugfix release for the Drupal 8.8.x series. Drupal 8.8.x will not receive any further development aside from security fixes. Sites should prepare to update to Drupal 8.9.0 or Drupal 9.0.0 for ongoing support.

Bug reports should be targeted against the 8.9.x-dev branch from now on, and new development or disruptive changes should be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 8.9.x-dev » 9.2.x-dev

Drupal 8 is end-of-life as of November 17, 2021. There will not be further changes made to Drupal 8. Bugfixes are now made to the 9.3.x and higher branches only. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.2.x-dev » 9.3.x-dev

Version: 9.3.x-dev » 9.4.x-dev

Drupal 9.3.15 was released on June 1st, 2022 and is the final full bugfix release for the Drupal 9.3.x series. Drupal 9.3.x will not receive any further development aside from security fixes. Drupal 9 bug reports should be targeted for the 9.4.x-dev branch from now on, and new development or disruptive changes should be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Schoenef’s picture

This one can be closed, no?

catch’s picture

Status: Active » Closed (outdated)

Yes there are still issues to work on, but this one is very outdated.