Closed (fixed)
Project:
Views content cache
Version:
6.x-1.x-dev
Component:
Code
Priority:
Normal
Category:
Feature request
Assigned:
Unassigned
Reporter:
Created:
12 Jun 2010 at 19:34 UTC
Updated:
10 Aug 2010 at 07:50 UTC
So, we should probably clear the node cache on comments? I've written the code, but turned it off. I wonder if it needs to be a config option somewhere, but I don't want to end up adding a ton of variables. This could turn into a nightmare if I'm not careful.
Comments
Comment #1
steven jones commentedSo thinking out this it probably makes sense (in the loosest definition) to provide some kind of central way for other modules to say:
Hey I've got some data that you should store a 'best before date' of.
The module can tell me when the data is not fresh anymore, and the caching plugin can provide a easy way for users to select from a number of different things that could invalidate the data that they are using to construct their view from, the external modules could implement hooks that return forms for setting options on the 'types' of data they expose.
For example, the 'node' module could expose a hook that defines that users can look for changes on nodes. It would supply an option form, allowing users to restrict the types that they'd like to monitor.
Or the votingapi module could define a set of options for the different voting result tags that it supplies.
Comment #2
steven jones commentedComment #3
steven jones commentedSeems like there are an awful lot of modules all trying to solve the same issues here. Notably, the expire module is trying to cover all the bases with a very advanced cache clearing system.
I think until that lands, we can add value by making it really easy to cover most bases when it comes to caching.
This basically means, we'll keep caching on a per content type basis, and not offer any further granularity, but will allow the user to decide if they want to consider comments and things like voting api votes as things that should clear the cache.
Comment #4
huesforalice commentedI think the module should be kept fairly simple for now. The feature of flushing the cache on node creation is something quite essential, and I was quite surprised that views comes without it. I would definately include flushing on comments and see that it integrates with core comments and node_comments as well. Not sure about the interface though, maybe just a checkbox "clear on comment". I'm pretty sure people will start using this and come up with useful ideas. As soon as there actually is a bunch of good ideas we can start work on a 2.0 version which maybe has an own API with hooks and submodules and so forth, but possibly that should be done if enough people are interested.
What do you think?
Comment #5
steven jones commentedHere's my current thinking:
There are two distinct sides to the caching coin, one is keeping track of what might change the contents of the view, and another is segmenting viewers so they only see content they are supposed to. The way views handles the expiry time stuff quite handily deals with the first case, and the way that the cache keys are built up goes some way to deal with the second. For the cache keys, I think a simple module_invoke_all would help a lot here, because modules like OG are going to need to segment the data based on what groups the user is in etc.
For the cache expiry, here's what I'm thinking:
On some event, say the posting of a comment, we ask module what keys they'd like to set for an event. The comment module (our hook implementations on its behalf) would set a 'comment' key.
Then, crucially we offer up these keys for altering in a drupal_alter, other modules, such as the node module, or OG can decide to extend the keys or modify them, so node module would duplicate the single 'comment' key for the node type to give 2 keys of comment-{node_type} plus the original global 'comment' key. OG can then come along and add its keys, depending on the group that the node was in, duplicating the existing keys but appending its own OG key. We then store these keys against the timestamp for the action in database.
Then, we allow modules to place bits of form on our plugin form, so they can let users choose what the caching on the view should look for.
When executing a view, we just look for the active options a user chose and concatenate them to give a string that matches somewhere in our DB table, we retrieve the expiry time, and away we go.
Example:
Now we can do lookups to find the last time any comment changed, the last time any comment changed on a blog post, or the last time a comment changed on a blog post in the 'Second group' really quite easily.
Sounds complex, but isn't actually too bad.
Comment #6
yhahn commentedAfter talking with Steven today I thought over this method a bit and came up with some more observations/thoughts. Please take it or leave it as you will : )
Suppose...
You have
nnumber of cache segments, where a cache segment isnode:type,comment:changed,og:nid, etc. as Steven described above.If for a given event you choose to generate every combination of cache segment key (as indicated in the example above) then you need to generate keys denoted by (see Wikipedia http://en.wikipedia.org/wiki/Combination):
For 2 segments (
node,comment):For 3 segments (
node,comment,og1):4 segments (
node,comment,og1,flag) is 15.5 segments (
node,comment,og1,flag,taxonomy_term1) is 31.etc.
In order for such a key storage to scale with node types, OG group segments, etc. it seems to be implied that each key/timestamp pair would be stored as a row in a table. If this is the case, we're talking about 3 additional
INSERTs/UPDATEson an event (node_save,comment_save, etc.) with two segments, 7 with three segments, etc.Based on experience with aggregation tools in the past (Feeds, FeedAPI and Managing News in particular) I know that insert/update is one of the slowest operations and the cause of serious bottlenecks. Adding an additional 15
INSERToperations pernode_save, for example, would be quite painful.The short is
Possible solution
Suppose that it's possible to query/retrieve the timestamp for a key like
node|comment|og1with only one or two of the segments (e.g. just thenodesegment or thenode|commentsegment). Then it actually turns out that every key in the combination except the deepest one contains duplicative information. For example, the rownode|comment|og1=>timestampWould tell you the right answer for all of the following questions:
og1?node?comment?node|comment?comment|og1?This means that supposing the retrieval method is robust enough on a given event we need to write only 1 row, or at most a couple if multivalues like OG are involved.
Implementation possibilities
You could imagine such a retrieval method working very easily if your DB schema were something like
A View watching only
blognodes could very easily find out its last valid cache timeAnd a View watching a more complex combination like
comment|blog,book|ogwould also have no problem finding out its necessary information:Now, obviously, we don't want to hardcode our table schema to specific segment types nor do we want to change our schema dynamically to accommodate new/unexpected segment types. We can get around this by a schema like
Where we assume a maximum number of cache segments. This seems like a safe assumption to make as in the examples above, when you get past 4 or 5 segments the use cases become increasingly corner-case.
Now whenever the cache is cleared we can lazily instantiate a mapping scanning all the enabled Views on the site that use the caching system:
And voila, we're ready to populate our cache table cheaply and retrieve the timestamps we need quickly.
Comment #7
huesforalice commentedI'm not an experienced module developer so I can't really help that much with creating the module structure etc., but what you're writing definately makes sense to me. If there's anywhere you think I can help out, let me know.
Just a quick insert because it has crossed my mind: Another feature I was thinking about is the possibility to replace/update certain information after the views data has been retrieved from the cache. Node comment for example has an own views cache which adds a "new" flag to new comments based on which user is looking at the comment-thread. Haven't really studied the code to find out how this is done, but it could come in quite handy. I'm currently working on a website where the client has a sort of Q&A, where the threads are listed below each other. He likes his timestamp to be an "ago" value instead of an absolute date-time string. In this case our cache wouldn't really work. Probably there'd have to be a hook which other modules can use to replace certain tokens. Something like that.
Comment #8
steven jones commentedI don't think what we're talking about here is actually complicated in terms of code, so when I start implementing some basic testing and sanity checking would be very, very useful. Also, documentation is always a bigger task than you'd think, so help there would be very much appreciated.
This sort of functionality (replacing tokens in cached output) is actually in views already, though I've not seen any working examples.
Comment #9
steven jones commented@yhahn, I knew that there should have been a better way of storing all that duplicate data, and making the DB do the heavy lifting in this makes me happy. Seems like a good refinement of the process to me.
Comment #10
coreyp_1 commentedJust out of curiosity, why no just add Rules integration? The would let the user decide when to clear the cache.
For example, suppose I have a view that shows nodes based on a flag from the Flag module. I could then set up the Rule to clear the cache for that view when a flag is added/removed.
Comment #11
steven jones commentedThis is now being worked on on github, I will copy it over to the 2.x branch in d.o CVS.
http://github.com/yhahn/views_content_cache
Comment #12
steven jones commentedThis is now in CVS.