Patch does the following:

Moves cache_clear_all() out of comment_save() into comment_form_submit() and bulk update functions, as has been done for other $entity_save()

Prevents thread calculation code from running if comments aren't threaded - this saves a SELECT MAX(thread) each insert for something which is never queried on later.

Additionally I found a field_cache_clear() in field_attach_insert() - we should not be caching field values in between inserting them and reaching the end of that function, would be quite strange if that happened.

Before patch:
6688 queries in 3205.56 ms

After patch with threading:
Executed 4978 queries in 2496.54 ms.

After patch with no threading:
Executed 3943 queries in 1847.67 ms

Saves around 50% of database queries, note I had search module enabled when running the tests which does a search_touch_node() on each comment insert, so the percentage is higher if you don't have this.

Note the patch looks a bit bigger than it really is due to indentation changes.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Status: Needs review » Needs work

The last submitted patch, comment_save.patch, failed testing.

catch’s picture

Status: Needs work » Needs review
FileSize
2.95 KB

OK the reason that test fails, is because if you switch from flat to threaded mode, old comments won't be ordered correctly. I would love it if we could get around this, but will split that to another issue. Threading is why we can never have nice things :(

To make up for this disappointment, I added variable for disabling the maintenance of {node_comment_statistics}, devel_generate and migrate could definitely use this, core can probably use it for bulk updates / deletes of comments too, athough that's not implemented yet.

Updated stats.

Executed 4309 queries in 1979.12 ms.

catch’s picture

One other thing, we appear to do search_touch_node() twice on inserting a comment, needs looking into, doesn't have to be done here.

catch’s picture

FileSize
3.19 KB

Moved the variable check inside _comment_update_node_statistics() so it works for node inserts too.

Status: Needs review » Needs work
Issue tags: -Performance, -migration

The last submitted patch, comment_save.patch, failed testing.

catch’s picture

Status: Needs work » Needs review
Issue tags: +Performance, +migration

#4: comment_save.patch queued for re-testing.

yched’s picture

Hm, that cache_clear in f_a_insert() is odd. It has no equivalent in CCK D6, so it looks like it was deliberately added at some point (it was there in the initial Field API patch).
I agree that I can't think of a reason why it would be needed, but I'm a bit puzzled at why it got added. Doesn't ring any bell :-/.

catch’s picture

I think we should assume we don't need it, certainly we don't have any such equivalent anywhere else.

catch’s picture

Opened #757956: Stop calculating threads for flat comments for the threading calculation, this patch is fairly trivial otherwise apart from not knowing why field cache clear was put in in the first place, so better to do that one separately.

moshe weitzman’s picture

Status: Needs review » Reviewed & tested by the community

Code looks straightforward, and bot is happy.

catch’s picture

Status: Reviewed & tested by the community » Needs review
FileSize
3.84 KB

Talked to Moshe in irc about the thread calculation, we decided it's enough to let calling code set it, on normal inserts that query should be quite fast anyway, and while we let people change comment threading settings on the fly there's no nice way to code around it.

moshe weitzman’s picture

Status: Needs review » Reviewed & tested by the community
Dries’s picture

Status: Reviewed & tested by the community » Fixed

Committed to CVS HEAD. Thanks.

catch’s picture

Status: Fixed » Needs review
FileSize
1.15 KB

We missed one here. comment_node_insert() doesn't check for the variable. ideally comment_update_node_statistics() would do a db_merge() so we didn't have the code duplication here, but for now here's patch to add the variable check.

moshe weitzman’s picture

Status: Needs review » Reviewed & tested by the community

wait for green before commit

Dries’s picture

Thinking about this some more. Instead of using a variable, I wonder if it makes more sense to use a database tag. And then to have a custom module that ignores the query. Seems like a bit cleaner?

moshe weitzman’s picture

I don't know that the custom module can stop the query entirely. It can make it a no-op like 'SELECT 1', but it can't stop it. Modules can set vars in hook_install and that pattern has worked well enough for us. To me, this feels like something that sites will configure in settings.php.

IMO, current patch is good enough.

Dries’s picture

While the current patch might be good enough, it is a complete hack as well. I'd like to get some more feedback from others on this.

catch’s picture

IMO setting a variable is more self-documenting, and much less of a hack, than having to rewrite a tagged query to make it a null op. We have several variables which allow you to unset stupid stuff in core ({node_comment_statistics} being one of the worst, since it's only really used by site-killing forum module queries) which can't be bypassed via the hook system, it's not great pattern, but it's better than forcing people to maintain custom patches to avoid their sites falling over (or migrations taking extra hours).

It was also committed with #11 already, this patch is just a follow-up.

One other option I can see here which would allow this to be disabled, would be committing #692950: Use hook_module_implements_alter to allow modules to alter the weight of hooks in module_implements. Then moving all _comment_update_node_statistics() calls to isolated hook implementations. It would then be possible with hook_module_implements_alter() to unset() those hook implementations entirely and force them not to be run. That would again make this a proper no-op rather than trying to force it with a query rewrite. I've not tested this though, and we'd run into problems if more stuff gets added to the hook implementations, although I don't think that's a problem at the moment.

moshe weitzman’s picture

I'm happy to proceed with catch's "other option" in #19, if folks prefer that.

Dries’s picture

Still waiting for feedback from people other than catch and Moshe.

Jeremy’s picture

I agree that a database tag feels like even more of a hack. The variable would work in a quick and dirty way, while hook_module_implements_alter() not only solves the immediate problem but also adds more flexibility that may be useful down the road.

kbahey’s picture

The only other place I could find the comment_maintain_node_statistics is in comment.module here.

function _comment_update_node_statistics($nid) {
  // Allow bulk updates and inserts to temporarily disable the
  // maintenance of the {node_comment_statistics} table.
  if (!variable_get('comment_maintain_node_statistics', TRUE)) {
    return;
  }

It is not documented anywhere either.

So I am not really keen on this hackish variable.

I like the potential superpowers in #692950: Use hook_module_implements_alter to allow modules to alter the weight of hooks in module_implements. Imagine doing one thing with module A before module B, and another thing with module B before A! Now that is really eeeeevillll ... And I love it!

jpmckinney’s picture

If we're going to be turning a feature on/off with a variable, that suggests the feature should be abstracted out into its own module. I can't think of another feature within Drupal that a user turns on/off by setting a variable (but I don't know all of Drupal).

Absent that, leaving the variable as-is or using hook_module_implements_alter() seem equally evil, though not terribly evil.

catch’s picture

There are tonnes of features in core which you can turn on and off by setting a variable - pretty much anything under /admin - page caching, block caching, gzip compression, various content type and comment settings etc.

The only difference here is that the feature is sufficiently obscure and developer facing that it's not worth a UI for, (or in this case even a settings.php entry).

There are several issues at play here, none which this patch attempts to solve, or that can be solved for Drupal 7:

The design of comment and tracker modules (and default per-table field storage for that matter) is extremely poor for sites with even a few thousand nodes or comments. Tracker and forum modules now have some code to ameliorate the effects of this, but they require maintaining index tables, which means additional writes. taxonomy allows you to turn maintenance of this index table off (also with a variable), in case you have a better option at hand (like mongodb field storage). node_comment_statistics is older, and doesn't.

Maintaining those index tables is a net win on the vast majority of sites, but when doing something like a migration or trying to avoid SQL altogether, you don't want to update a comment count 1 million times for 100,000 nodes, you want to update it all at the end.

Drupal has plenty of mechanisms for doing additional actions on certain events, but very few mechanisms for disabling things. Pluggable includes and variable checks is pretty much all we have at the moment, callbacks + alter hooks in some cases too.

Owen Barton’s picture

If #692950: Use hook_module_implements_alter to allow modules to alter the weight of hooks in module_implements gets in then I think it is reasonable to expect contrib to use that. If not, however - I think this is a big enough gain for large imports to justify a variable_get. The variable_get wrapper pattern is pretty commonly used in a number of places. Sure it is fairly obscure as APIs go, but chances are only 1 or 2 contrib modules (probably written by the people on this ticket!) need to know or care about it - everyone else just needs to find and install the module and check the "fast unthreaded comments" (or whatever) checkbox.

jpmckinney’s picture

Ah, of course, I forgot about all the variables set by settings forms. What still seems odd here, though, is that in this case we would stop maintaining the node_comment_statistics table, but we would still join on it, etc. as if it had useful content in it. That suggests it should be abstracted out instead of crippled.

catch’s picture

Well the idea here is once you're done with the migration, you rebuild it programmatically. devel.module has a helper function to do just that.

jpmckinney’s picture

Okay, so the intention is not to leave this variable set to FALSE for the long term. Could we document that? Otherwise I think the patch is RTBC.

Dries’s picture

Let's go with this patch, but per #29, let's add a code comment explaining what this feature/variable is intended for. Deal? :)

catch’s picture

Status: Reviewed & tested by the community » Needs review
FileSize
1.29 KB

Added an additional comment.

jpmckinney’s picture

FileSize
1.09 KB

I was about to copy-paste the comment in #31 to the only other occurrence of the comment_maintain_node_statistics variable, and then I noticed the other occurrence already had a comment, so I just copied that comment to this occurrence.

catch’s picture

Status: Needs review » Reviewed & tested by the community

That comment's better that the duplicate one I added. So let's go for that, RTBC again.

Dries’s picture

Status: Reviewed & tested by the community » Fixed

Committed to CVS HEAD. Thanks.

Status: Fixed » Closed (fixed)
Issue tags: -Performance, -migration

Automatically closed -- issue fixed for 2 weeks with no activity.