Download & Extend

Expiration Grid - road map for this module

Project:Cache Expiration
Version:6.x-1.x-dev
Component:Code
Category:feature request
Priority:normal
Assigned:Unassigned
Status:active

Issue Summary

Create a grid for ways that a node can be expired, and what actions expire what pages.
These "Actions" can currently flush a node (they call boost_expire_node):
* Voting API
* Comments
* Nodes

Once That node has been signaled to be flushed, it can flush:
* Self
* Front page if promoted
* Tagged taxonomy term pages
* Other items contained in the menu it belongs to (nodes, views, ect...)
* CCK node reference fields
* Views containing this node

There is also the option of killing the file or merely marking it as expired in the database.

Example:
New Comments - Kills: Node, and views. Not: front page (if promoted), CCK references, taxonomy terms
Edited Node - Kills: Node, front page, CCK references, views. Expires: taxonomy terms
New Vote - Expires: Node. Kills: Views. Not: Front page, CCK references, taxonomy terms

Open to ideas, patches, ect... this will take some time to do.

Comments

#1

On a quite large site I am running on Boost+NGINX, I have the need for defining more refined policies regarding the maximum cache lifetime of certain content types or specific pages. I know you can do so by using on the Boost blocks (choosing a scope and then selecting the specific max cache lifetime), but I would rather have a central place to edit these settings. If that idea sounds like something others may be interested into, I can clean up and commit back what I am working on.

#2

Have you thought about doing it with VBO? I'm always interested in stuff like that.

#3

Hmmm, not sure if that could be done in VBO, except if we were to propose batch setting for specific nodes. What I am more interest in a first place is content type specific settings; i think that it's what most people would potentially use. What are exactly the types of "scopes" you had defined for Boost? Content-type, Nid and what else?

#4

So for the node its
Node
Node type like 'page'
NID

View
view name
view display like 'page' or 'page_1' or 'default'

taxonomy
vocabulary
TID

Code that does this

<?php
/**
* Gets page_callback & page_arguments from menu_router table
*
* Allows for any content type to have it's own cache expiration.
* TODO Better support of panels.
*/
function _boost_get_menu_router() {
 
$router_item = menu_get_item();

 
// Handle nodes
 
if (arg(0) == 'node' && is_numeric(arg(1))) {
   
$node = node_load(arg(1));
   
$router_item['page_callback'] = 'node';
   
$router_item['page_type'] = $node->type;
   
$router_item['page_id'] = arg(1);
    return
$router_item;
  }
 
// Handle taxonomy
 
if (arg(0) == 'taxonomy' && is_numeric(arg(2))) {
   
$term = taxonomy_get_term(arg(2));
   
$vocab = taxonomy_vocabulary_load($term->vid);
   
$router_item['page_callback'] = 'taxonomy';
   
$router_item['page_type'] = $vocab->name;
   
$router_item['page_id'] = arg(2);
    return
$router_item;
  }
 
// Handle users
 
if (arg(0) == 'user' && is_numeric(arg(1))) {
   
$router_item['page_callback'] = 'user';
   
$router_item['page_type'] = implode(', ', user_load(array('uid' => arg(1)))->roles);
   
$router_item['page_id'] = arg(1);
    return
$router_item;
  }
 
// Handle views
 
if ($router_item['page_callback'] == 'views_page') {
   
$router_item['page_callback'] = 'view';
   
$router_item['page_type'] = array_shift($router_item['page_arguments']);
   
$router_item['page_id'] = array_shift($router_item['page_arguments']);
   
// See <a href="http://drupal.org/node/651798" title="http://drupal.org/node/651798" rel="nofollow">http://drupal.org/node/651798</a> for the reason why this if is needed
   
if (is_array($router_item['page_id'])) {
     
$router_item['page_id'] = array_shift($router_item['page_id']);
    }
    return
$router_item;
  }

 
// Try to handle everything else
 
if (is_array($router_item['page_arguments'])) {
    foreach (
$router_item['page_arguments'] as $string) {
      if (
is_string($string)) {
       
$router_item['page_type'] = $string;
        break;
      }
    }
  }
 
// Set empty if page_arguments is an empty object.
 
if (!isset($router_item['page_type']) && empty($router_item['page_arguments'])) {
   
$router_item['page_type'] = '';
  }
 
// Set to first object in array if page_arguments is still an array and cast it as an string.
 
if (!isset($router_item['page_type']) && is_array($router_item['page_arguments'])) {
    if (
is_object($router_item['page_arguments'][0])) {
     
$router_item['page_type'] = (string)get_class($router_item['page_arguments'][0]);
    }
    else {
     
$router_item['page_type'] = (string)$router_item['page_arguments'][0];
    }
  }


 
// Handle panels
 
if (strstr($router_item['page_callback'], 'page_execute')) {
    if (
db_table_exists('delegator_pages')) {
     
$pid = db_fetch_array(db_query_range("SELECT pid FROM {delegator_pages} WHERE name = '%s'", $router_item['page_type'], 0, 1));
    }
    elseif (
db_table_exists('page_manager_pages')) {
     
$pid = db_fetch_array(db_query_range("SELECT pid FROM {page_manager_pages} WHERE name = '%s'", $router_item['page_type'], 0, 1));
    }
   
$router_item['page_id'] = $pid ? $pid['pid'] : 0;
  }

  return
$router_item;
}
?>

#5

Awesome, I am getting to it later on today and will let you know as soon as I have a usable patch.

#6

Like the idea of copying pathauto for node presets & not caching certain node types. There is a lot of potential here.

#7

Title:Expiration Grid» Expiration Grid - road map for this module
Project:Boost» Cache Expiration
Version:6.x-1.x-dev» <none>
Component:User interface» Code

What needs to happen:
Scan every view looking for paths. Each view that contains a path is treated like a node type. Detect other entries in the boost_cache table and generate configuration options for them as well.

Each cache type can have options for (some will be specific to the content container; node, view, etc...)
* min & max cache lifetime
* is-cacheable setting
* pager/url-query setting <- auto detect and have smart defaults (nodes, no pager; views with exposed filters will allow all; etc...)
* promoted can flush front page; pager support.
* node reference flush: forwards, backwards, both
* taxonomy control; certain vocab's can have different flushing options
* menu tree options
* views handling; figure out the paging issue. flush first 2 pages asap (configurable), expire rest over a period of time.
* expire or flush (expire marks it for a crawler; flush kills from the cache instantly)
* support for coded advanced configuration (hooks!)
* make this "exportable"

Different actions can trigger different expiration/flush settings. Example: New comments should only flush the view page that the node lives on, not the entire view with all it's pagers. Or if the theme doesn't indicate the comment count on the view then have the option to not flush the view on comments. Or if the view is directly related to comments then the full view should be flushed in a graceful manner.

The configuration file (the "exportable") will come before the GUI, because the GUI will be quite complicated & it will take some time to make this graceful. Once the GUI is in place, make the manual configuration part as minimal as possible by being smart with detection and defaults.

Make all of this support domain access and be multisite friendly. This is the road map; in short I'll be taking out the smarts from boost and putting it in this module. Boost does most of this right now in some sort of fashion; making it happen based on set rules is key to success.

#8

interesting. subscribe.

#9

Version:<none>» 6.x-1.x-dev

This all becomes mind bogglingly complex after a while, and then someone will go, hey I use OG, can you make this N-dimensional too?

yhahn has been looking into this stuff for OA, so might be worth getting him involved too.

Subscribe.

#10

Subscribing.

#11

How to deal with all the writes that happen to the boost_cache table: Don't update all fields on each cache creation "action".

How to deal with all the writes that happen to the boost_cache_relationship table: Have a dirty flag so if the parent or child entity gets updated then it knows to recreate the relationship on cache creation. Will need some smart logic for views pagers. This is a major priority. Once I get this figured out I can then bring in views to the expires module.

#12

views need to store the argument given to it as well as the page number its on. Finding new content on views with arguments will be a challenge; taxonomy I can make it work, other types of arguments will not be as easy to magically do. Current progress on views page cache logic is going on here
http://drupal.org/node/785766#comment-3341042

#13

subscribe

#14

subscribe

#15

Subscribing.

#16

Just opened the D7 branch. see msg here: http://drupal.org/node/1151684#comment-4932604

How would you feel about releasing the current 6.x branch as a stable 1.0? I've been using it together with Purge on a production site for some time and works like a charm.

I would also like revamp the project page. Add a descriptive up to date list of features and integration options. And no more "playground" etc. This is some serious cache whipping were doing here ;-)

#17

1.0 sounds like a plan. Go ahead and publish a release; if you don't I might get around to it by Friday.

This might be an interest to you: http://drupal.org/project/httprl I had some free time yesterday so I put together the code into a module. The possibilities that a non-blocking http request brings to the table is mind boggling. Any task that is not directly associated with generating the current page's html can in theory be spun off into a background task. Something to keep in mind as you develop the code.

#18

Interesting. In purge I already use parallel execution of the requests through the use of curl_multi objects. Did you compair your approach with curl_multi? I'll investigate the background task option. Sounds like it's what I need to get the option to refetch the object after purging to perform reasonably. I was thinking to make it work with drupal_http_request as a failsafe. Will use this as a third option.

#19

the 1.0 is there. No changes to the code itself.
I've also improved (I hop you agree) the project page a bit. Will now start with the D7 port...

#20

Not all hosts have curl & I'm not sure if you can have curl "ping" a url (non-blocking mode in httprl). I might want to add in a curl implementation to httprl as well as one using sockets (like d6) as a fallback; because some hosts have socket_select disabled.

#21

On the http request library issue:
The "ping" idea sounds cool. Will investigate never seen in in any php_curl documentation, but then again, many things are not documented I learned the hard way (and through google;-)
The good thing about the purge request I send to varnish is that it doesn't hit a backend and I only check for the error code, ignoring output. So for my use case those requests are fast and "cheap" and the error return code is very usefull, but just for debugging. In the end we're bound to hit some performance bottleneck.

Seen some interesting node.js stuff at drupalcon london. Maybe a "cache-director" deamon on top of node.js could solve out quest for clean but warm caches. Just no idea where to start on that idea.

I guess in the end what we really need is this: #64866: Pluggable architecture for drupal_http_request() . I would love to get some movement in that long standing issue.

On "porting" boost code to expire:
I've been reading through some of the 7.x code of boost but so far cannot find any of the node/comment/use api stuff I was expecting after reading through expire-6.x. I am right to assume boost 7.x acts on completely different logic? Could you give me some pointers on where to start ripping out the expiration parts?
In the meanwhile I was just porting expire 6.x hooks to 7.x and that might get the job done too and a good 7.x api coding exercise and ready to rip that out when you come up with a better idea.

D7 port status: Configure form and "drush xu" already work. Tested with Purge and Varnish :-)

#22

Found this library for curl that has something similar to the non blocking mode of httprl: https://github.com/jmathai/php-multi-curl

7.x boost is dumb currently. I was going to put the smarts in expire. Your best bet is to translate what is in 6.x and move it forward to 7.x. I still don't have any 7.x sites so any code that is of 7.x series in any of my modules is fairly basic.

#23

subscribe

nobody click here