Cache Control is a module for integrating your site with the Varnish HTTP accelarator in a fashion that not only allows for caching pageloads for anonymous users but also for authenticated users.

As to our knowledge, there are no similar modules publicly available.

Link to sandbox page: http://drupal.org/sandbox/JanneSalo/1138426

Comments

firebird’s picture

An example module showing how cache_control works is now included in the module package.

greggles’s picture

Can you discuss how this compares to solutions based on edge side includes, like http://drupal.org/project/esi?

ESI seems like a more standard way of handling this idea rather than cache ids and javascript requests.

rojarvin’s picture

Cache Control module minimizes the amount of requests to Drupal. Unnecessary bootstrapping of Drupal is bad for performance. When using ESI for user specific blocks, a separate request will be made for each block that are not cached already. Cache Control makes max two requests even if there are more than two blocks to display user specific data.

A page having ESI blocks needs to be fully compiled before it is shown to the user. Using Cache Control, most of the page is available as soon as reverse proxy returns the anonymous version and user-specific content is added once it is ready.

Also, Cache Control works with any reverse proxy so its not dependent on e.g. ESI-support.

The sandbox page has been updated to include a short comparison to the ESI-module.

Niklas Fiekas’s picture

Priority: Normal » Critical

According to the priority guidelines.

greggles’s picture

Thanks for the description, rojarvin, and for posting a description of the differences to the project page.

This module is somewhat difficult to review because it is only useful to folks who use Varnish AND who need a solution for customized content for some users. It might be a good idea to post to the high performance group asking for a review - http://groups.drupal.org/high-performance

kvirta’s picture

greggles, there has been a post on high performance group about this module. It seems it didn't help in getting this module approved.

http://groups.drupal.org/node/150039

This module is in production use now on multiple Drupal sites (and more are coming) and we have people asking when it's going to go through. Any ideas how to push it forward will be highly appreciated.

greggles’s picture

I tried to post in irc to get more reviewers, we'll see if that bears any fruit.

The project page is now quite long which will make it hard to find the download links. I suggest moving the "API" section into an API.txt inside the module (if it isn't already there). Same idea for "How do I use it" which can move to INSTALL.txt

kvirta’s picture

Thanks for the effort, greggles. I moved some of the text on the project page to README.txt and API.txt already has the same stuff as the API section, but with more details, so I removed that too. Now the project page should be a bit more clear and a bit shorter too.

bibo’s picture

I haven't done any official module reviews yet, but I'll try I'll give a limited review and my opinion on cache_control so far.

About my expertise on this matter:
I'm spending a significant part of my workhours trying to improve the scalability, performance and quality of (small to) large Drupal-sites, and know Varnish (and VCL) relatively well. During the last year I've been on/off investigating how to best implement Varnish+authenticated users for really high volume sites. I've also been somewhat active in high performance group, usually asking questions, though.

Comparing to current contrib
I really had hoped ESI would by now be the de-facto standard to implement it with Drupal+Varnish, but the related modules seem to be stalled or even abandoned:

* http://drupal.org/project/esi (Last release/dev: 2010-Aug-25)
-> Active on 21 sites, 16 issues.
* http://drupal.org/project/esi_api (Last release/dev: 2011-Mar-14)
-> Active on 3 sites, 2 issus.

I've tested mainly the D6 esi -module, and failed to find a well performing setup (without custom code). Mainly because of the multiple bootstrap issue (for each block). As already mentioned cache_control circumvents it by other means, which seems to work (in most cases).

To me it seems cache_control is a lot closer to authcache and ajaxify_regions than the esi modules. Both of those haven't seen progress recently, nor a D7 version, unlike cache_control. Authcache is anyways using memcache as page cache instead of varnish, and the caching is only role-based (requiring a lot of customization). cache_control seems quite a bit more advanced. These modules should probably be listed on the comparison part of the module page.

Improvement suggestions
* Varnish 3 was released recently with many improvements but the vcl syntax has also changed dramatically. cache_control ships with a default vcl for Varnish 2.x.
=> I would like to see a Varnish 3.0 alternative/default VCL file in addition to the current cache_control.vcl
* The integration and reusage of current contrib modules (such as varnish, purge, expire, cache_actions, authcache etc) is not as good as it could be.
=> Some of the modules are on the TODO-lists. Lets hope there will be more module interusage when this is added to contrib.
* Put all *.tpl files to /cache_control/templates -folder and submodules to /cache_control/modules to keep the main folder simple and organized.
* "working with any proxy" is not the best reason to leave ESI support out. ESI is supported by others, while Varnish is currently the only real choise to be used with Drupal.
=> Then again, the module is currently performing better than the ESI-modules I know of, which is what matters.
=> I still believe in ESI, but maybe this module should not be "forced" to implement ESI-support, since that route is already taken by others and this route works for now.
* Boost has been around for a long time. Even though it's ment for anonymous users, it has A LOT of configuration options for cache expiration / clearing.
=> Some of the ideas could/should probably be considered for this module as well. Maybe in the future?
* Do I understand the function _cache_control_is_cached_pageload($set = NULL) correctly: once the current page has been set cacheable, there is no way to decide to to not cache the page after all later during the page load.
=> I would definitely want to see a custom hook that allows other modules to overrule cache_control and decide NOT to cache the current page. I noticed this:
<? module_invoke_all('cache_control_cached_pageload'); ?>, but that hook happens too late.
=> Is there already a global no-cache mechanism that I missed?

Code stuff
* cache_control_examples_page_controls.tpl.php contains some non-standard handling for strings and links:
echo '<a href="'.$base_url.'/node/'.$cc_node->nid.'/edit">Edit page</a>';
Should be more like:
echo l(t("Edit page"), 'node/'.$cc_node->nid.'/edit'); .
=> In other places strings and links seem to be handled just fine. Is this page maybe called somewhere out of bootstrap with no l() or t()?)

Open questions
* How does it handle multiple blocks that change often (like per page & per user & role)?
=> Is it possible to make sure cache_control generates the least possible duplicates while still keeping the requests per page to the minimum?
* Could blocks be bundled freely, so that the admin can decide the best logic (for least requests)?
* Do you have some performance statistics to show the potential speed gains for authenticated users when using cache_control?

Security
I noticed a small and rather theoretical risk that could lead to leaking private information in blocks to other users.
Private block content could (theorically) be read by other users, if they happened to know the hashed url of a user specific block request.
That information is hard to get though, but not as hard a getting a session id form a cookie. There's a reason why session ids nowadays always stay in cookies and not urls.

There are ways to make sure that Varnish can distinguish the content (url), but visitors cannot. The session id (or a hash of it or something similar) could be added to the internal Varnish hash.

Here is a sample VCL 3.x snippet that adds the hash for private blocks (if the path and loggin in status match). I haven't actually tested this with Varnish 3.x, just suggesting to look into this approach:

# If user is logged in, some pages may be cached per user session, which creates several cache-entries for the same url.
if (req.url ~ "/cache_control/block/private/" && req.http.Cookie ~ "^.*?SESS(.{32})=([^;]*);*.*$") {
    hash_data(regsub(req.http.Cookie, "^.*?SESS(.{32})=([^;]*);*.*$", "\1\2"));
 } 

The result would be that an user visiting http://example.com/cache_control/block/private/af13af312a31a3f3af3a213af12 would and should not see the same (private) content that another authenticated (or anynymous) user would.

I don't see this "flaw" as critical, and It's also possible this is already handled somehow, and I just didn't notice?

Conclusion
I'm not marking this as RTBC yet, since my review probably doesn't cover 100% of the module review tasks. Also, I might be a bit biased.

I haven't gone through the current full codebase but peeked into every file, and what I looked into was clean and well documented code. AFAIK coding standards, Drupal API and commenting recommendations are followed rarely this well. Even the install/uninstall hooks delete old variables. Good job!

There are unfinished features (such as statistics support, which is best handled by google_analytics anyway), but then again, is there ever a software project that is "finished"?

From a users point of view, there is real demand for this module, and as I see it, the module is ready to be released.

In other words, I myself would like to see this module among the other d.o modules as fast as possible. I already have some experience witnessing it in action on a few large sites. It will probably also be part of our next D7 project.

Janne Salo’s picture

Thanks for your comments, bibo. Here are some quick comments of my own:

-The module directory structure will be cleaned up and the minor coding standard issues fixed ASAP.
-As of today, we're planning the VCL file for Varnish 3 and will make it available as soon as it's done.
-ESI support is currently not on the TODO list, but it's definitely worth consideration. Maybe we'll put it on the roadmap once we have a stable version out there. Also, a review on current state of the art of the caching modules out there (in terms of what they actually do, what could be integrated etc.) would be very useful.
-Your note on _cache_control_is_cached_pageload() is correct; when cache_control decides the page load will be cached, it stays that way. A custom hook to allow modules to prevent this is a good idea. I will add it. The hook_cache_control_cached_pageload() is meant for other modules to deal with the fact that the current page load is being cached.
-The automated block support is still experimental (and actually incomplete), so I'll have to come back with details on that later (finalizing the feature is the top item on my TODO list). Anyhow, it should be able to handle multiple and ever-changing blocks pretty well: all blocks are generated in the same fashion as any other component on a cached page, in a single get_components call that will produce any user-specific content.

Currently, we don't have for show any measurement data on how the module performs with authenticated users, but that's definitely something we'll want to display on our front page.

Again, thanks for the comments, we'll soon come back with fixes and more detailed answers to your questions.

Janne Salo’s picture

The current HEAD of the 7.x-1.x-dev branch now has fixes for many of the issues:

-Reorganized the module directory structure and improved coding standards compliance as advised by bibo.
-Added hook_cache_control_override_caching() that allows modules to prevent pageloads from being cached simply by returning false from their implementation
-Fixed some bugs in the experimental automated block support, works a lot smoother now (some graphical glitches still remain).

Please note that the 6.x branch is maintained a bit less actively than the 7.x one, so not all fixes are available there yet, but the branch is far from dead, so the fixes will be there eventually.

bfr’s picture

Priority: Critical » Normal
Status: Needs review » Needs work

As a co-worker of Bibo, i have heard good things about this module, and really hope this gets promoted soon.

By the way, in D7 renderable arrays are now preferred instead of l(), but it's a minor thing. However, i noticed few non-code related things that you need to fix before this can be set to RTBC.

1. Remove version info from .info file(s)( version = "7.x-1.x-dev" ). It is added by drupal.org.
2. Line length cannot exceed 80 characters in text files(README.txt for example).
3. MASTER-branch is not used in drupal.org, it should be empty.

I'm not going to dig into the code or features, they seem to be reviewed by enough people, as long as no-one spots security issues with this, i would say it's good to go.

bfr’s picture

Status: Needs work » Reviewed & tested by the community

Actually, since Greggles seems to be on fire right now getting projects out of the approval queue, let's just set this to RTBC.

The developer shows deep understanding of inner workings and the API of Drupal and follows best practices and coding standards. For a module this advanced
it is not reasonable to except completely bug-free release at this stage - nor it is the meaning of this queue, but rather to see that the developer
is up to the task of handling the process of quality module development, which is pretty clear in this case. The module is well coded and works, does not duplicate any functonality and is in fact very valuable contribution.

The things i mentioned in previous post are minor and quickly fixed before promoting the project from the sandbox, so my opinion is that this is good to go.

greggles’s picture

Status: Reviewed & tested by the community » Fixed

Sure, makes sense to me :) The issues bfr raised should still be worked on, but don't need to block this becoming a full project.

Thanks for your contribution, Janne Salo! Welcome to the community of project contributors on drupal.org.

I've granted you the git vetted user role which will let you promote this to a full project and also create new projects as either sandbox or "full" projects depending on which you feel is best.

Thanks, also, for your patience with the review process. Anyone is welcome to participate in the review process. Please consider reviewing other projects that are pending review. I encourage you to learn more about that process and join the group of reviewers.

Janne Salo’s picture

I made the improvements suggested by bfr and released the project into the wild. A dev release has been created for both D6 and D7. A stable 1.0 release for both is a top priority.

Thanks for everyone involved.

http://drupal.org/project/cache_control

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.