Under normal circimstances caching every node into the static cache is ok because usually at a guess only 10-20 nodes would be loaded on every page request.
However when doing conversions or in situations when a lot of nodes needs to be loaded at one time this can cause a WSOD because of memory issues.
With the #233091: Move code from update.php into includes/update.inc patch which will allow upgrades to be done from the shell instead of via the browser which can take much longer it is quite possible to load all of the nodes in a system into memory.
This change implements a basic less used removed first purging policy to drop nodes which are just taking up memory and not to be used. Under normal situations this should not be done, but in extreme use cases to will keep the memory useage by the node_load() at a reasonable level.
I am not sure if 20 is a good or bad number, but for testing the purging it was much better to have a lower number to test. I have a couple of drupal_set_messages() which need to be removed, but this to see how often the cache is being purged as it should not be done too much.
| Comment | File | Size | Author |
|---|---|---|---|
| #34 | lru-375494-34.patch | 5.25 KB | mikeryan |
| #31 | lru-2.patch | 5.21 KB | dhthwy |
| #24 | lru.patch | 5.17 KB | moshe weitzman |
| #21 | cache_lru.patch | 1.74 KB | catch |
| #19 | cache_lru.patch | 1.74 KB | catch |
Comments
Comment #1
kbahey commentedI support this for the very reason you mentioned. Running any module with a large number of nodes, fails with blowing PHP's memory limit.
The node module is to blame, but equally the function taxonomy_get_tree() and taxonomy_term_count_nodes() are big culprits. So perhaps you can include them in your patch?
Comment #2
gordon commentedI don't think they should be done here. taxonomy_get_tree() is a bad one mainly because it loads the entire tree even if you only want a part of it. I have played with this in the past which would build the tree from where you asked for it, so if you were 3 levels down it would reduce the size of the memory needed, esp useful when you have 120000+ terms.
I think we should set these up as different issues. Make it easier to get them into core.
Comment #4
gordon commentedCleaned up patch to work again.
Comment #5
berdirA few minor things...
- Missing whitespace...
- You shouldn't add stuff to patches that shouldn't be commited :)
- Whouldn't array_slce() be easier to use, given that we only need one chunk? http://ch.php.net/manual/en/function.array-slice.php
Comment #6
gordon commentedthanks I will post another patch.
The main reason for the bits to remove was to help stimulate conversation over what NODE_LOAD_MAX_CACHE should be set to. But I will put it to 30, as I think this is something that any more will be just taking up memory for the sake of it.
Comment #7
gordon commentedI have fixed up all the issues as listed above.
I also have written a couple of new tests to confirm the new restrictions on the size of the node cache is preforming as expected.
Comment #8
catchHad been discussing a similar thing with Damien recently but I don't think there's an issue yet. This could also be centralized for all first class objects if #460320: Standarized, pluggable entity loading (nodes, users, taxonomy, files, comments) gets in.
Comment #9
damien tournoud commentedIndeed, we discussed with catch implementing an in-memory LRU cache. I even did some testing, but failed to open an issue yet.
Comment #10
moshe weitzman commentedIf one is doing strange things like iterating over all nodes, then one can clear the cache every so often. once might argue that this is too specialized for core. i'm not very opinionated either way.
Comment #11
gordon commentedYes you can do this, but sometime you don't know how many are loaded, as you could have places where nodes are loading nodes. Also sometimes you don't know.
Also with using things like panels/views a lay-person can easily do massive queries on the page quite easy it will help to keep down the memory footprint.
Gordon
Comment #12
moshe weitzman commentedOK, I remove my objections. This is a good idea.
I would love to change the constant to a variable_get() so that sites can change the cache size as needed. A precedent for that is
variable_get('session_write_interval', 180)in _drupal_session_write(). I will RTBC after that change.If the entity loader gets in first, this should move into there.
Comment #13
gordon commentedI have made the change so that this is a variable_get() and can be set via $conf
Comment #14
moshe weitzman commentedComment #16
catchThis should now go into the entity.inc static caching (which is based on node_load_multiple()) so ought to be more or less cut and paste. Only question then is do we restrict the number held per entity type - you might easily have 300 taxonomy terms on a page, compared to 300 nodes for example.
Comment #17
moshe weitzman commentedAnyone up for a reroll? I think 100 items is a nice number. Could be lower I suppose.
Comment #18
catchentity.inc needs the patch now.
Comment #19
catchJust ran into a huge memory leak when running devel_generate() on article nodes, due to a node_load() in file_field_update(). Here's a re-roll minus the tests putting this in entity.inc just to see what the test bot says.
Comment #21
catchMinus the notreallycamelcase.
Comment #22
moshe weitzman commented#21: cache_lru.patch queued for re-testing.
Comment #24
moshe weitzman commentedRerolled with tests and even better docs. I could use some help on the remaining test failure. Seems like it is a real bug in the LRU code, not in the test.
Comment #26
catchLooked at this for a little while with + += array_merge() and array_merge_recursive(), as far as I can see if there's a numeric index, inserting new items will always insert them in numeric order with any of those rather than prepend/append the entire array. Didn't break it down to verify this yet.
Comment #27
moshe weitzman commentedWould be good to get this in soon.
Comment #28
webel commentedAm very interested in a Drupal 6.16 version of this.
I have a large command line script for generating an offline version of a Drupal site with thousands of nodes.
Despite all other memory reduction strategies, the memory usage just grows and grows until the limit is hit,
and as far as I can tell node_load is the culprit.
Comment #29
webel commentedPerhaps a little of topic, but I am using the following and nevertheless memory grows with every node_load:
Grateful for any feedback.
Comment #30
webel commentedMoved my questions to support request: #813590: Require assistance with memory usage in node_load
Comment #31
dhthwy commentedan LRU sounds cool.
Previous patch's test failed because the popular node was only cached once, so it never got a chance to get into the front when loaded repeatedly. We need to cacheSet entities even if they're already in the cache for LRU to work right? Headsmacking oversight I think...
Comment #32
retester2010 commented#31: lru-2.patch queued for re-testing.
Comment #34
mikeryanRe-rolled for the 8.x branch.
Comment #35
moshe weitzman commentedLooks proper to me. Nice tests.
Comment #36
fagoWhy that?
If we add a new entity info key, we also need to add docs for it.
As to the coding standards, the first line of the comment block should be a full sentence.
Out of context, this comment makes me just wonder what LRU means.
Comment #37
geerlingguy commentedSubscribe.
Comment #38
catchJust opened #1199866: Add an in-memory LRU cache, that is not quite ready for here yet, would be good to have the facility to add multiple items to cache at once, just cross-linking for now.
Comment #39
geerlingguy commentedTagging.
Comment #40
moshe weitzman commentedIs this still unbouded? We did add a persistent entity cache recently.
Comment #42
catchMarking postponed on #1199866: Add an in-memory LRU cache. The latest patch here actually solves this issue too.
Comment #56
smustgrave commentedSeems the postponed issue landed a while ago.
Comment #57
berdirNot only that, but #3498154: Use LRU Cache for static entity cache landed this year, which is a full on duplicate of this.