Cache node_load
chx - January 20, 2007 - 13:54
| Project: | Drupal |
| Version: | 7.x-dev |
| Component: | node system |
| Category: | feature request |
| Priority: | normal |
| Assigned: | moshe weitzman |
| Status: | patch (code needs work) |
Description
Cache node_load, clear on node_save. I am not too interested in this patch, I just wanted to get the ball rolling.
The status would be adequate as code needs work but then people would assume I will work on it.
| Attachment | Size |
|---|---|
| node_62.patch | 1.39 KB |

#1
some discussion revealed that:
poll.module needs to be patched to do user specific stuff in hook_view not hook_load. I already discussed this with Steven.
comment.module will need to invalidate the node cache if a comment is added.
This patch is likely to give us gains for complicated node types a la cck. I am not sure how much it duplicates cck's inbuilt cache.
The update path is missing too.
#2
For an earlier attempt to get this into core and related discussion see here:
http://drupal.org/node/8506
#3
patch needed to be fixed wrt cache usage.
#4
I have done some testing with simple node types (story, forum, page) and there was no performance improvement. This was expected since there are hardly any extra queries performed when loading these node types.
#5
i would recommend looking into this in the context of moving field modules into core (i.e. CCK). there isn't much point without those, and CCK has thought about this a lot already.
#6
Would it make sense to move this caching into a *node specific* cache table?
It seems to be a site like d.o that has 100,000+ nodes could fill up this cache really fast. Having it in it's own table might be more efficient.
#7
Good morning, Ted:
+ if (!isset($nodes[$param]) && ($data = cache_get($param, 'cache_node'))) {
:)
#8
Err, yes yes! I was looking in the patch for the install file patch to create that table doh :-)
Splendid patch though.
#9
Hmm, this is a bigger boost than I imagined. I benchmarked this on NowPublic where we saw 50% even 75% less page load times but note that NP uses memcached.
#10
I upped the wrong file --again!-- and also note that as NP is a D5 , of course not this patch is tested but our variant of it. But the concept is the same.
#11
The user module changes are feature creep. First, most of them are whitespace trims and a bugfix actually, user_edit cleans the cache but user_delete does not, that's fixed now.
#12
1. The node schema has one too many empty lines now with this patch.
2. You repeat the is_object()... line twice. I would do:
+ if (!isset($nodes[$param])) {+ if ($cache = cache_get($param, 'cache_node')) {
+ $nodes[$param] = $cache->data;
+ }
+ }
+ // Either the node was already cached or we loaded it in:
+ if (isset($nodes[$param])) {
+ return is_object($nodes[$param]) ? drupal_clone($nodes[$param]) : $nodes[$param];
+ }
3. The system update does not look like being up to schema API standards to say the least.
That's it. Otherwise it would be nice to see a simple benchmark, after which I could say: well done :)
#13
sounds reasonable to me. we need to add code that flushes the cache when a module is enabled/disabled, or does the install system do that already?
#14
Reroll.
#15
Rerolled with Gabor's comments in #12 fixed.
So all this needs now is a benchmark ?
#16
yes please!
#17
PS : and moshe's comment #13 was adressed by latest chx's patch
I'm not really equipped to benchmark this myself. Anyone ?
#18
Same as yched's patch, a typo in a comment is fixed (thanks pwolanin) and -p added.
#19
A little test module for benching.
#20
quick bench using ab -n100 -c5 http://127.0.0.1/drupal6/
Server Software: Apache/1.3.33, PHP 4.4.7, MySQL 5.0.45 , Mac 10.4.9
hitting the the front page (/node) with 30 nodes of a mix of types (book, forum, page, etc).
without patch:
Requests per second: 5.66 [#/sec] (mean)
Requests per second: 5.62 [#/sec] (mean)
Requests per second: 5.62 [#/sec] (mean)
with patch:
Requests per second: 6.44 [#/sec] (mean)
Requests per second: 6.40 [#/sec] (mean)
Requests per second: 6.25 [#/sec] (mean)
node load/edit/save works fine.
#21
note - these benchmarks are for the 2nd and later times running ab. There is, of course, a pump-priming effect so the first time running it with the patch is about the same or even slower than without the patch.
#22
enabled modules:
core-req + actions, book, color, comment, dblog, forum, help, menu, taxonomy, update, devel, devel_generate, simpletest
Note, with the patch simpletest find a warning when running either the "Page node creation" or "Unauthorized page view" tests:
Unexpected PHP error [mysql_real_escape_string() expects parameter 1 to be string, array given] severity [E_WARNING] in [drupal6/includes/database.mysql.inc line 350]#23
this error seems to be generated by calling node load like:
$node = node_load(array('title' => $edit['title']));#24
Fixed. I used $cachable to decide on caching but that's not enough. I now cache only node_load($nid) which is 99% of the cases anyways. It's not really feasible to database cache node_load(array()) because it can be dependent on user, time, whatever. Thanks for the report and the benchmark.
#25
one more fix: the relevant node cache needs to be cleared in this function:
http://api.drupal.org/api/function/book_outline_form_submit/6
in fact, it should probably call cache_clear_all() as well.
#26
There is still a TODO in translation.module ?
#27
In addition to the book module fix-ups, I think some of the function calls are missing a parameter:
cache_clear_all('*', 'cache_node');should probably be:
cache_clear_all('*', 'cache_node', TRUE);#28
revised patch - adds last param to cache_clear_all() and more cache clearning in book.module and menu.inc. Removes spurious change to index.php.
Please review.
#29
- how are the menu module changes related?
- how are the poll module changes related?
- as yched said, there is still a TODO in translation module
#30
#31
Great, committed! (I think this will be loved by people with lots of "xmas tree hangings" on their nodeapi load hooks).
#32
I'm very concerned that this change will have a HEAVY impact on large, high-traffic sites. If I understand it correctly, it will be saving to the cache table *every time a node is loaded*, and clearing the node cache *every time a node is saved*, correct?
Unless I'm missing something, this will have a detrimental effect on high-traffic sites unless the 'only clear the cache of nodes that actually changed' code goes in.
#33
"every time a node is loaded" will be infrequent because of this very patch. we are replacing the node load queries with a cache_get().
we already wipe most caches when any node is saved. not sure why that would be worrisome. please elaborate.
#34
I think this was a bit of a false alarm; I just noticed that the coe in node_save() only clears the individual node. This needs to be documented VERY clearly, though, or we'll experience a world of pain as modules convert over.
Repeat: this is HUGE in terms of unpredictable, unexpected behavior. node_load() was previously fresh per page load, and now it's not. Red flashing lights!
#35
Yes, that definitely need to be documented in the update docs at least. Excuse me for not setting this to "needs work" (as usual after a core commit when the update docs need updating).
#36
My concern was more serious before I realized that only multi-node operations cleared the ENTIRE node cache. Basically, node_load() has always been a 'read-only' operation. While there may be lots of queries, at least we knew that the flow of data was only going in one direction.
This patch ensures that any fresh node_load() will go in TWO directions. It will pull in data, and write to the cache. That makes node loads even MORE expensive, but the payoff is that the next load for that node will be faster -- unless it gets cleared. This means that the frequency with which the node cache is cleared becomes CRITICAL. If it's cleared too frequently, we never get the benefits of the cached loads, and everything just gets more expensive. In addition, if I understand properly, anyone who's trying to to a master/slace setup has to deal with *loads* writing to the master DB.
None of these are obvious showstoppers, but I worry that we don't understand the impact of the change on large sites with high traffic. One contrib module that flushes this cache relatively frequently could bring a site to a crawl.
#37
I agree with eaton about the frequency of clearing the node cache. So, a related question - is it worth running extra queries to determine exactly which nodes need to be cleared (for example in book module or the translation module) rather than do a blanket clear of the node cache? It seems running an extra query (for example) and then potentially deleting a bunch of single items from the node cache it going to be pretty expensive too.
Also, issue for the menu.inc part of the patch I rolled (which chx sut out as unrelated)- indeed it relates to clearing the block and page cache, not the node cache: http://drupal.org/node/170514
#38
Followup patch, reported by Gabor. Translations fixing.
#39
Great. Modified the patch to get it documented. Now we have the translation set caches flushed only if we update a node in the set, not all cached node_loads. (I added some documentation and fixed one indentation problem I found on the way).
Setting it to needs work, so it gets documented.
#40
We discussed with chx and come up with some documentation for the node_load cache. I also ended up documenting the internal cache, so people know what the effect of both are, how are those invalidated and what to put into the load operations. Committed the attached patch.
It still needs to be documented in the update docs, but I leave this to others :) Also, anyone is welcome, if having more correct documentation.
#41
I'd still like some feedback in terms of book module and whether it makes sense to make the cache clearing there more targeted too.
#42
OK this needs more consideration. Hope next time folks will review the patch BEFORE it's in.
#43
Well, let's set this to review and see what people have on it.
#44
Motivation for rollback: obviously the huge performance benefit is not enough to get people move their dynamic code from load to view.
#45
I'm the type of person who really needs the performance benefit. Shouldn't we just put a big bold item in the update docs saying "Put dynamic content in hook_view"? Don't at all agree this should be rolled back. Which modules are currently suffering as a result of this patch? I'd suggest we help roll patches for them instead of rolling this back.
#46
Back to original -- ie. needs documentation. I was never serious about rolling back.
#47
Hrm. Say I had a poll.module of my own (not core, but this example exists merely because everyone knows what a poll is). Said poll has 45 votes on it, and is growing rapidly. Are those 45 (46, 47, 48, etc.) votes "data" about the node that should be loaded in accurately, or merely visual garbage to be added at view time? I argue that they're data that should always be accurate at node_load() time - I would hate to have to sit on a 'view' to find out how many votes have been cast when I may not actually care little for displaying the node. How does this cache handle this consideration?
#48
Morbus, first sentence in the issue: Cache node_load, clear on node_save.
#49
I've also some general concerns about parent/child relationships between different node types. I've built various internal node types that have specific relations to each other - a Card is a child (subclass, etc.) of an Expansion. As such, any time a Card is loaded, I also add to the $card_node the results of an $expansion_node node_load() - in almost every use case, a Card required the Expansion data as well, so I wanted to save duplicate code by loading the Expansion in always. With the above cache, Expansion data could get stale within the cached Cards - I'd have to include a cache clear on Cards for every Expansion save (or else, a node_load then node_save). Not as worrisome as "I consider vote counts to be node_load data, not node_view" (above) certainly, but another scenario that may be valid for the forthcoming documentation.
#50
Gábor: irrelevant. A poll vote is user-contributed supplementary data to the poll - it has a similar data "feeling" as a user comment. It doesn't change the bulk of the node itself - it doesn't change the body, or the question, or the options, or the title of the node. It is merely a statistical increment about the node itself. I would hate to spend a whole node_save to increment a poll vote by one (any more than we should see a full node_save to increment, say, the node "Views" statistic by one, or to issue a node_save when one comments).
#51
Our comments crossed. Resetting title back.
#52
Gábor: incidentally, voting in core's own poll.module doesn't use node_save either (see poll_vote for the direct updates) - it instead issues a clear against the entire cache. Is that the suggestion for designs like the above (both the poll incrementing and the parent/child relationship)? Fine, if it is, but it should be part of the forthcoming documentation.
#53
We probably want to change the cache_clear_all() inside poll_vote() to something like cache_clear_all($node->nid, 'cache_node'), maybe? I'm not set up in any way for Drupal 6 development, so can't test or create a patch. It would /seem/ to make sense, however, based on the existing changes this patch added.
#54
i think this patch is what broke upload: http://drupal.org/node/168813
nodeapi op 'prepare' is being called after 'view'. i'm not up to speed on this enough to know what the right way to fix that would be.
#55
@Morbus Iff : cache_clear_all() is called by node_save(), comment_save() and lots of other places - it clears the page and block cache NOT the node, menu, filter, etc caches. It's called anytime anything changes (pretty much)
#56
As just mentioned on the dev list before I saw this thread...
I'm finishing up a site now where most of my node_load() calls are NOT for displaying the node directly. They're loaded for their data. Data should not be loaded in op view. op view is for viewing. op load is for loading. There is no reason why you will always be viewing after loading. The site I'm on now loads all the time without doing a view op. Saying "well just do your load in the view" is a bad answer, since that's confusing two decidedly different pieces of the system (data vs. rendering).
#57
The motivation here is to create a cache layer for often static stuff. We have chosen the 'load' step for that. I think thats reaonable.
Then we said that if you have dynamic info in $node you need to set that later. But the only later opportunity that we offer is 'view' which is not good, as you point out.
So, perhaps we need a new nodeapi op that happens after load. The results of that are not cached. In a way, thats a lot like the D5 menu system which caches the $_menu but gives a chance to change it when it does the !$may-cache operation.
An alternative solution would be to check for the presence of $node->node_nocache property and avoid caching if it is TRUE. Modules with dynamic load info could choose to set that.
#58
Then, I think we really need to roll this back. A new nodeapi op or a new column in node definitely does not fit into the freeze. Of course, we might want to say "hey, the performance boost worths it" and then I will be happy to roll something -- but I smell D7 here. Rollback patch above.
#59
Then I am wrong, says Moshe. One can set a $node->no_load_cache property in hook_load or nodeapi op load and then dynamic stuff won't be saved.
#60
yes, short and sweet.
#61
Personally, I'd roll back this patch and postpone it to Drupal 7. IMO, it should not have been committed in the first place.
#62
Agreed. This needs more discussion and breathing room to get accepted or massaged into a state where it is applicable.
Rolled back the patch with chx's patch, slightly modified to also roll back the node_load doc block I added about this and reworded the system update removal notice, so it does not suggest this *will* be part of Drupal 7, but only that it is up for discussion. (Actual committed patch attached).
Let me know if there are any remainings which are not rolled back, but should have been.
#63
#64
I don't think this should've been closed - was the rollback that was fixed rather than the original issue, reset if I'm wrong.
#65
Would be terrific is someone resubmits this patch. It has decayed a little but not too bad. I think my suggestion to add a $node->no_load_cache property (see #59) addresses the mentioned concerns.
#66
subscribing.
#67
here's a first cut at reviving this patch for drupal7 (including moshe's suggestion at #59).
tested this lightly on a fresh install (create/update/edit nodes and comments), and it seems to work as advertised. i haven't tested the translation code.
not 100% sure i've updated this faithfully, but i'm happy to keep working on this and see it through until its committed/rejected.
#68
Thanks much for updating this I had a read through the patch and found minor doc issues
- "The cache API based cache stores all nodes throughout requests,". Howe about ch"The cache API based cache persists loaded nodes across page requests"
- I prefer positive names so I would $node->no_load_cache to $node->load_cache and make abort caching if it $node->load_cache === FALSE
- $node->load_cache needs to be added to the doxygen for node_load()
I will try to do a functional review soon.
#69
Do we want to postpone this until we have a sense of how or whether node fileds will be in core? that would seem like it might have substantial impact on whatever caching approach we want to take.
#70
This is a small patch that can be changed by the fields patch if that materializes and gets committed. I don't believe in postponing useful patches in favor of non existent, non ready patches.
#71
updated patch as per moshe's comments in #68 attached.
#72
I created and edited nodes and used books and added comments and could not get this to misbehave. Update path works too.
I only added two cache clears in when submitting or cancelling a poll vote. Code looks good. RTBC.
To summarize, any module can disable the can disable caching of any given node as needed. That addresses the one significant objection so far. benchmarks have shown that this is a small performance boost for a plain core site. It will be much more impressive for a fully loaded site with oodles of Contrib modules who perform various nodeapi load operations.
#73
The doxygen comment for translation_node_get_translations() at the very end of this patch seems to be some unwanted leftover from the many patches/rollbacks, the changed sentence is false. Same patch as above, but with that last change removed.
#74
updated patch to keep up with HEAD, no functional changes.
#75
updated patch to keep up with HEAD, no functional changes.
anyone?
#76
Rerolled for 2 failed hunks and restested. Works fine.
#77
The failed the revisions simpletest and the problem is in the patch. I will investigate.
#78
rerolled with revisions fix and a book outline fix. unit tests are super useful (if you can stand to wait for them to finish)
#79
trying to get patch to attach
#80
I uploaded patch to http://tejasa.com/mw.patch
#81
patch is not working on your site moshe
#82
Here is as a local upload.
#83
Unfortunalty this breaks some tests. I ran all the node related modules. Please fix untill then 'code needs work'.
It breaks:
Poll vote
Taxonomy nodeapi
Translation nodeapi
#84
Fixed the test failures. The remaining failures happen before this patch as well.
#85
The tests are fine now... I do however have some comments about the book part
book/book.module:489 and book/book.pages.inc:216 We just have one node here? Why clear the whole cache?
#86
Hmmz I did some more testing and see why you do the cache_clear_all now. You need to clear all the parents and the book too.
Just hope we don't add/delete alot of book pages then because we'd have to build the cache allover again.
The tests are all working and I can't find any strange stuff in the patch. Looks RTBC to me. If noone reviews in a few days I'll set to RTBC.
EDIT: It might be good though to comment the cache_clear_all's in book so everybody understands them and makes your code readable.
#87
I also ran through all tests and had a quick look through the code - everything looks good. Probably worth putting an extra comment in book to make it clear what's happening, so RTBC with those - just needs a quick re-roll to put them in.
#88
Added comments to book module's cache flush.
#89
quick note to confirm this applies cleanly, causes no test failures etc.
#90
Applied fine, tests are good, RTBC
#91
For what it's worth, the load_cache added here, and the poll submission changes, address all my concerns (#49, #50, #52), so I have no further qualms about the implementation.
#92
Fixed a new failure with tracker.test and reworked poll a bit to fix a menu bug that I had introduced in the last version of this patch.
#93
Once the static flag $node->load_cache is set, a node load will not be cached. Even if it is just one property that cannot be cached. Now consider a contrib module that acts on all nodes and cannot cache its properties. This flag reminds me of the 'no_cache' op for filters, which is equally inflexible (but reasonable there). However, I don't see why we do not want to allow f.e. 90% of a node to be cached, and only the rest (if at all) uncached - by caching op 'load' and loading any non-cacheable data in a subsequent op 'after_load' (or 'load_uncached' or similar).
#94
@sun, while that sounds like a good idea - is it not something which could be done in a followup patch?
#95
@Sun - thats out of scope for this patch. We don't do partial caching of objects anywhere I can think of ... The prior status was RTBC.
#96
+ // Subtract from the votes.+ poll_allow_votes($node);
this together does not mesh as the function does not substract anything. Looks copy paste error.
poll_allow_votesshould return (besides setting) ->allowvotes so the menu access callback can bereturn user_access($perm) && ($node->type == 'poll') && !$inspect_allowvotes || poll_allow_votes($node));.- $header[] = array('data' => t('Vote'), 'field' => 'pc.weight');Why is this here?
#97
forgot to update status.