Restrict the number of nodes held in the node_load() static cache
| Project: | Drupal |
| Version: | 7.x-dev |
| Component: | node system |
| Category: | feature request |
| Priority: | normal |
| Assigned: | gordon |
| Status: | needs work |
Under normal circimstances caching every node into the static cache is ok because usually at a guess only 10-20 nodes would be loaded on every page request.
However when doing conversions or in situations when a lot of nodes needs to be loaded at one time this can cause a WSOD because of memory issues.
With the #233091: Move code from update.php into includes/update.inc patch which will allow upgrades to be done from the shell instead of via the browser which can take much longer it is quite possible to load all of the nodes in a system into memory.
This change implements a basic less used removed first purging policy to drop nodes which are just taking up memory and not to be used. Under normal situations this should not be done, but in extreme use cases to will keep the memory useage by the node_load() at a reasonable level.
I am not sure if 20 is a good or bad number, but for testing the purging it was much better to have a lower number to test. I have a couple of drupal_set_messages() which need to be removed, but this to see how often the cache is being purged as it should not be done too much.
| Attachment | Size | Status | Test result | Operations |
|---|---|---|---|---|
| node_load_cache.patch | 1.73 KB | Idle | Failed: 10267 passes, 1 fail, 0 exceptions | View details | Re-test |

#1
I support this for the very reason you mentioned. Running any module with a large number of nodes, fails with blowing PHP's memory limit.
The node module is to blame, but equally the function taxonomy_get_tree() and taxonomy_term_count_nodes() are big culprits. So perhaps you can include them in your patch?
#2
I don't think they should be done here. taxonomy_get_tree() is a bad one mainly because it loads the entire tree even if you only want a part of it. I have played with this in the past which would build the tree from where you asked for it, so if you were 3 levels down it would reduce the size of the memory needed, esp useful when you have 120000+ terms.
I think we should set these up as different issues. Make it easier to get them into core.
#3
The last submitted patch failed testing.
#4
Cleaned up patch to work again.
#5
A few minor things...
+ $nodes+= $queried_nodes;- Missing whitespace...
// TODO: These messages need to be removed before this is committed.drupal_set_message(format_plural(count($node_cache)-NODE_LOAD_MAX_CACHE, 'Purge @count node from static cache', 'Purge @count nodes from static cache'), 'warning');
- You shouldn't add stuff to patches that shouldn't be commited :)
- Whouldn't array_slce() be easier to use, given that we only need one chunk? http://ch.php.net/manual/en/function.array-slice.php
#6
thanks I will post another patch.
The main reason for the bits to remove was to help stimulate conversation over what NODE_LOAD_MAX_CACHE should be set to. But I will put it to 30, as I think this is something that any more will be just taking up memory for the sake of it.
#7
I have fixed up all the issues as listed above.
I also have written a couple of new tests to confirm the new restrictions on the size of the node cache is preforming as expected.
#8
Had been discussing a similar thing with Damien recently but I don't think there's an issue yet. This could also be centralized for all first class objects if #460320: Standarized, pluggable entity loading (nodes, users, taxonomy, files, comments) gets in.
#9
Indeed, we discussed with catch implementing an in-memory LRU cache. I even did some testing, but failed to open an issue yet.
#10
If one is doing strange things like iterating over all nodes, then one can clear the cache every so often. once might argue that this is too specialized for core. i'm not very opinionated either way.
#11
Yes you can do this, but sometime you don't know how many are loaded, as you could have places where nodes are loading nodes. Also sometimes you don't know.
Also with using things like panels/views a lay-person can easily do massive queries on the page quite easy it will help to keep down the memory footprint.
Gordon
#12
OK, I remove my objections. This is a good idea.
I would love to change the constant to a variable_get() so that sites can change the cache size as needed. A precedent for that is
variable_get('session_write_interval', 180)in _drupal_session_write(). I will RTBC after that change.If the entity loader gets in first, this should move into there.
#13
I have made the change so that this is a variable_get() and can be set via $conf
#14
#15
The last submitted patch failed testing.
#16
This should now go into the entity.inc static caching (which is based on node_load_multiple()) so ought to be more or less cut and paste. Only question then is do we restrict the number held per entity type - you might easily have 300 taxonomy terms on a page, compared to 300 nodes for example.
#17
Anyone up for a reroll? I think 100 items is a nice number. Could be lower I suppose.