If you use node_delete() for mass deletion of nodes it wastes memory due to the fact that every node gets cached via node_load() before deletion. I created a patch to change that behavior.

I additionally changed the order of deleting rows form node and node_revisions tables and calls to hooks. From my point of view additional node related data from other modules should be deleted first. In other words, deleting the master data set (row in node table) should be the last step.

This change also allows to set up foreign keys in the database against the tables node an node_revisions to increase data integrity.

Comments

mkalkbrenner’s picture

I found a related feature request and bug report but without patches:
#285484: Adjust node_delete
#181816: node_delete should invoke hooks before deleting rows

mkalkbrenner’s picture

Version: 5.9 » 6.3
StatusFileSize
new1.08 KB

Attached you'll find a patch against drupal 6.3 because the same issues still exist there.

But it would be nice if you also accept the patch for drupal 5.x because this is still our current production environment ;-)

Dave Cohen’s picture

+1 I want this fix as well.

aj045’s picture

+1 as well.

Given what I wanted too, this seems more of a feature request. That is why in my original post I think this should be farmed out to D7. New features are usually not put onto stable releases, unless absolutely necessary.

Just make a 7.x patch and have the community review it. Hopefully it'll get in.

mikeryan’s picture

+1. I independently made the node_load($nid, NULL, TRUE) change and found it made a huge difference when deleting thousands of nodes in bulk (from a rate of about 2000 nodes deleted/minute to 45,000).

nancydru’s picture

Version: 6.3 » 7.x-dev

+1, changed version so it has a small chance of being seen by the core developers

Status: Needs review » Needs work

The last submitted patch failed testing.

mikeryan’s picture

Version: 7.x-dev » 6.10
Status: Needs work » Needs review
StatusFileSize
new603 bytes

My thoughts:

  • The memory leak and the ordering of DELETE queries are really two separate issues - putting them in the same patch is not conducive to getting the memory leak fixed in core.
  • This is not a feature request - it is a performance bug, a significant one for anyone doing bulk deletes of nodes. I think it should be considered for 6.x.

In that light, I'm providing a patch against 6.10 that addresses the memory leak only. This change is small and safe - all it does is prevent caching of nodes that are about to be deleted anyway.

nancydru’s picture

You stand a whole lot better chance of having this see the light of day by posting it against 7.x and noting that is also a serious bug in 6.x, but it's you call.

minesota’s picture

Subscribed

mikeryan’s picture

The patch is against Drupal 6, I haven't set up a Drupal 7 environment yet... Could someone on Drupal 7 see if the patch applies against it (or generate a patch for Drupal 7)?

sevi’s picture

Subscribed

damien tournoud’s picture

Version: 6.10 » 7.x-dev

Bugs are fixed in the current development branch first, then backported.

Status: Needs review » Needs work

The last submitted patch failed testing.

chx’s picture

Version: 7.x-dev » 6.x-dev
Status: Needs work » Closed (won't fix)

In Drupal 7 this is not really fixable as we load all nodes at once with node_load_multiple and there is only one copy of them and then we pass around objects. For Drupal 6 we could try fixing it but given the behaviour in D7 I am inclined to won't fix this... if you want to delete that many nodes, batch it. Otherwise when updating to D7 your code will break again.

moshe weitzman’s picture

Status: Closed (won't fix) » Reviewed & tested by the community

actually, i see this as completely harmless for d6 and it fixes a bug. why would you ever cache a node that you are deleting? d7 has some big changes here but they aren't really relevant for this patch. lets let gabor and others decide.

gábor hojtsy’s picture

The patch actually clears off the whole runtime node cache on each node_load() called from node_delete(). The node is still cached, but all previous nodes caches are removed. So on subsequent node_delete()'s, the only cached node will be the one cached in the previous node_delete() (if there are no other node operations inbetween). Although Dave Reid suggests this as a workaround to fix memory issues at http://api.drupal.org/api/function/node_load/6#comment-69 I'm not sure people on this issue fully understand that this is what happens.

gábor hojtsy’s picture

Title: node_delete memory leak and wrong order of commands » node_delete "leaks memory" due to deleted nodes cached

Retitled to the current focus of the issue.

Garrett Albright’s picture

I ran into this quirk in D6.

Besides the performance issues, it also seems to me to be incorrect behavior that it's possible to load a node which should have been explicitly deleted earlier in the session. In other words:

node_delete($nid);
$node = node_load($nid);

…gives you a node! Is it possible that there's security implications to that?

gábor hojtsy’s picture

Status: Reviewed & tested by the community » Needs work

@Garrett Albright: yes, as I've explained, the patch actually does not do anything to prevent the situation you explain from happening. All it does is it removes *previously* cached nodes. Since this does not seem to do as advertised, looks like we need to rethink this a bit.

Status: Needs work » Needs review

Re-test of node_delete_6_10.patch from comment #8 was requested by Mobi.

stoptime’s picture

subscribe

eric_a’s picture

To fully clear the cache in D6 we could do a second node_load(). Performance wise the best place to do this is after having deleted from the database.

node_load($node->nid, $node->vid, TRUE) clears all nodes from cache and since we passed a revision nothing will be put into the cache. (Actually, since no node is returned, we will not even reach that check on cachable.) Actually, the value for the revision should not really matter here but if you want to be 1000% sure that nothing will be cached (in case a buggy db layer would still return a node ;-)) then exclude any value for which $revision == NULL to force the system not to cache.

robertdouglass’s picture

Status: Needs review » Reviewed & tested by the community

The patch in #8 is better than the current D6 code because it fixes the memory leak. It's absolutely embarrassing that our software is guaranteed to die of an out-of-memory error if you try to delete enough nodes in one request.

The second problem is that node_load just shouldn't cache nodes if the third parameter is TRUE, but I'm going to suggest that we leave that alone since we've fixed it in D7.

There's not much more review needed here. #8 is the correct patch to apply if we're happy with an incremental improvement.

I'll leave it to Gábor to disagree and change the status.

robertdouglass’s picture

I also think it's safe to say that committing #8 now would be better than letting people run into this for another 18 months.

gábor hojtsy’s picture

Status: Reviewed & tested by the community » Fixed
StatusFileSize
new762 bytes

I've added two lines of comments to make the intention of this invocation clear. I still think this is a hack that actually removes nodes which might have been intended to be cached, but I'm hearing the reasons for that to be a problem. This will certainly slow down pages which delete nodes and then try to reuse nodes already loaded before.

Committed to Drupal 6.

eric_a’s picture

@gabor, @Garrett Albright:
Are you with me that the (security) issue mentioned in #19 could be fixed by first querying the node table for vid and then doing node_load($nid, $vid, TRUE) instead of node_load($nid, NULL, TRUE)?

Basically:

- node_load($nid, NULL, TRUE);
+ // Make sure we don't cache the deleted node: pass vid to node_load().
+ $vid = db_result(db_query('SELECT vid FROM {node} WHERE nid=%d', $nid));
+ node_load($nid, $vid, TRUE);

If I'm not mistaken this problem does not exist in D7 anymore, so maybe fix it here while this D6 issue is still open? The extra query must be worth the gained security?

Garrett Albright’s picture

Well, nobody else seems to think it's a security issue, and I myself wasn't sure (note I said "is this a security issue?" and not "this is a security issue!"). Before we go adding queries, let's first decide if there really is a security issue here, and if so, what the exact nature of it is. That way we can be sure that any attempts to fix it are truly effective and we're not just adding code to make us feel safer.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.