Background:

We had an initiative to add Flag to drupal.org, in order to flag issue comments as "This is an issue summary", and highlight them, so that if you had an issue with 100+ comments, you could immediately get to understanding of what the issue is. You can read all about this on:
#1036132: Provide a mechanism for issue summaries

That issue got postponed for two reasons:
1) The Prairie Initiative is thinking in broader terms about what issue tracking should look like and
2) The solution I implemented on my test d.o site added considerably to the query load when you were on an issue page. See comments #43, #50, and #52 on #1036132: Provide a mechanism for issue summaries for details. ==> This is the subject of the feature request here.

Feature Request:

When you are on a node page with comments, and you have a comment flag set up, Flag is doing one query per comment when displaying the comment, to figure out whether to add a "flag this" or "unflag this" link. Then in this case, there was a second query per comment done at display time to decide whether it had been flagged as "this is an issue summary", and change the theming accordingly.

The feature request would be to do all of this in one query per node (find the flag status of all the node's comments in one query) and cache it. At a minimum, it shouldn't be doing the same query twice for each comment -- once the query has been done once, the result should be cached so when I ask again if that comment has been flagged, it can just return the answer from memory.

The thought would be to add a function like flag_preload_user_flags($nid) that would do one query to load the flag status of all the comments in one fell swoop. Then flag_get_user_flags() would look at this cache to see if it already has information rather than doing a separate query for each comment.

Hopefully that makes sense...

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

dww’s picture

We'd love flag on d.o for other reasons, too -- primarily for #34496: [meta] Add Flag module to allow users to subscribe/unsubscribe without posting a comment. So, any love this issue can get would be most appreciated! I don't have time/energy right now to dive in and fix this myself, but I'll try to help review and otherwise support this issue whenever possible.

Lars Toomre’s picture

I am subscribing to this issue in the hope that the flag module can be used on d.o to implement #34496: [meta] Add Flag module to allow users to subscribe/unsubscribe without posting a comment. I would love to see all of the subscribe comments eliminated from the issue nodes!!

klonos’s picture

Subscribing so I know where we stand on all related issues and perhaps I can "ping" people that postponed #1036132: Provide a mechanism for issue summaries once enough(?) time was given to the Prairie Initiative or if it fails to come to a conclusion and is eventually abandoned/postponed itself.

We shouldn't be keeping solid solutions for long-standing problems from being implemented simply because there *might* be a better way. That is a thing that will always happen anyways since perfection is unreachable. IMHO if it works, we should simply implement it and then we can improve it as time permits. I do understand that there is some performance impact (I don't know the size of it though) related with the solution, but I believe we need to sacrifice some things in order to gain.

Anyways, since most of the "brains" behind both the solution and the discussion are really busy people, I'll be keeping an eye on it and "pocking" people to move on. I promise to allow enough(?) time before I start "annoying" people.

PS: sorry for the lengthy and perhaps off-topic comment. I didn't simply want to "subscribe" ;)

dww’s picture

@klonos: yes, very off topic. ;) your basically justified rant belongs in the other issue. This was meant to be a place to actually accomplish a (hopefully) small but important optimization in flag that will allow us to safely deploy it for a variety of use-cases on d.o.

I pray that everyone who cares about #1036132 and #34496 does not follow your lead and "subscribe" here or otherwise try to discuss the d.o use cases. That's just going to slow down progress here.

Thanks!
-Derek

rfay’s picture

Following this and looking forward to a better way to do so.

rfay’s picture

As I think about this, I'm *very* happy to have flag perform better.

BUT: The functionality of drupal.org is more important than even performance. It behooves us to get content differentiation widely deployed on drupal.org. My copyrighted phrase of the day is "Drupal.org is a cesspool of undifferentiated crap." It is. Fixing it is even more important than performance.

catch’s picture

Subscribing, there's a chance I'd be able to look at the patch although I can't promise anything, however if I do I'll post back here to say I'm working on it.

geerlingguy’s picture

I'm subscribing for unrelated reasons; I'd like a project I'm working on to perform better as it scales out (right now, it's not a big issue, but that's going to change in the next 3-6 months, and I might be able to devote some time to this patch).

dww’s picture

@rfay: I applaude your enthusiasm and initiative. I've long referred to large swaths of d.o as "a swamp". However, "Fixing it is even more important than performance." -- I wish that were true. We can't just start deploying code on d.o until the servers melt, the site goes offline, and everyone's pagers are going off. The clear road ahead is to deal with the performance issues before they take down the site, not while it's happening. jhodgdon had a very clear plan, and it makes sense to me. We just need to see it coded into something that can be reviewed, committed, pushed, and then deployed on d.o.

Thanks
-Derek

gapple’s picture

Just wondering about the status of this issue, since #1036132: Provide a mechanism for issue summaries seems to have taken a different path from using Flag module. What would the impact of implementing precaching be on #34496: [meta] Add Flag module to allow users to subscribe/unsubscribe without posting a comment?

Precaching comment flags seems pretty straightforward, though nodes with a lot of comments on many pages could foil the benefits.
Are there examples of any instances where precaching Node or User flags would be beneficial?
Would it be possible to batch the retrieval of flag information in some instances?

pillarsdotnet’s picture

Title: Query efficiency » Flag module query efficiency on large sites
killes@www.drop.org’s picture

subscribe.

#1204474: Less queries was marked a duplicate of this, it has some thoughts on what should IMO be done:

've declined this in the current state of the module. It adds IMO too many queries.
(imagine a long issue like the one linked to above).

I'd like to see the static cache in flag_get_user_flags be made more configurable so that we could get all the flags from some other cache (memached for example) instead of from the DB.
Or, as an alternative, some hook that gets called before the view hook could be used to pre-fill the static cache.

catch’s picture

From the d.o point of view, there is no longer discussion about using this to flag comments.

Only to flag nodes, and only in the full node view (for issue subscriptions).

So for me, as long as flag links don't get shown in any lists of nodes (especially not issue queue tables which page at 50 items), then this patch wouldn't be necessary for deployment (although would need to keep an eye on stuff later).

Also in D7 this would be many, many times easier to fix because there is hook_entity_prepare_view() that lets you load user-specific information for lists of entities. So as long as we don't need flag links on lists of entities before d.o upgrades to D7, I would probably bump the version (again, at least for d.o. usage, this might still be a good D6 patch if we could figure out how to do it, but with no multiple load in D6 it seems tricky).

jhodgdon’s picture

I think actually that if you have flag turned on, it will do those queries for every comment anyway? Not sure, should check the code and/or test it.

amateescu’s picture

@jhodgdon, I just checked, those queries are not run on a Node flag type.

There are only two flag queries shown in Devel's query log on a flag-enabled node:

0.35 1 flag_get_flags      SELECT f.*, fn.type FROM flags f LEFT JOIN flag_types fn ON fn.fid = f.fid
0.36 1 flag_get_user_flags SELECT * FROM flag_content WHERE content_type = 'node' AND content_id = 13 AND (uid = 1 OR uid = 0) AND sid = 0
amateescu’s picture

I promised @killes that I will try to come up with a patch for this issue, so here it is :) Kudos to dereine for clicking the idea on how to handle this.

On a node with 163 comments we get from:

Executed 395 queries in 127.55 milliseconds.

to:

Executed 234 queries in 93.12 milliseconds.

The doxygen comments could use some better wording, but I hope the general approach is sane enough.

amateescu’s picture

Status: Active » Needs review

Of course, CNR.

amateescu’s picture

From IRC: @dereine thinks that flag_preload_user_flags() looks more like a private function, and @xjm noticed that I forgot about doxygen docs.

amateescu’s picture

Fixed a problem with merging $flagged_content arrays.

Gerhard Killesreiter’s picture

I've tested this locally and it nixes most of the queries.

amateescu’s picture

@killes suggested that we should use db_placeholders() in _flag_preload_user_flags().

catch’s picture

This might run into issues on very long comment threads since it's pulling everything rather than just those on the current page. Would probably still be an improvement up to 1000 odd depending on comments per page.

quicksketch’s picture

This looks pretty good overall. I find it a little bit strange that it's implemented specifically for comments, but I see why such a thing would exist instead of making it generic (you can't really do the same thing with nodes, unless you knew how they were being listed). I haven't been working much on Flag recently, so strong, well-reviewed patches will get this in.

Also related to performance, this patch would reduce a huge number of COUNT() queries, which right now we do on every flagging: #719616: Keep flag_counts current by manual in/decreasing of count.

sun’s picture

Status: Needs review » Needs work
+++ b/flag.inc
@@ -1411,6 +1411,28 @@ class flag_comment extends flag_flag  {
+  function is_flagged($content_id, $uid = NULL, $sid = NULL) {
...
+  function get_flagging_record($content_id, $uid = NULL, $sid = NULL, $content = NULL) {

Missing phpDoc "Overrides flag_flag::[method]()."

The phpDoc description should clarify why these methods are overridden and what they are doing differently.

+++ b/flag.module
@@ -1496,6 +1496,10 @@ function flag_get_content_id($fcid) {
+ * @param $content
+ *   Optional. The content object that corresponds to $content_id.
+ * @param $prefill
+ *   Optional. Pre-fill the static cache for the given $content.

I don't see a use-case nor functional logic that would support $content without $prefill == TRUE...? So why does $prefill exist?

+++ b/flag.module
@@ -1545,6 +1556,49 @@ function flag_get_user_flags($content_type, $content_id = NULL, $uid = NULL, $si
+    $result = db_query("SELECT * FROM {flag_content} WHERE content_type = '%s' AND content_id IN (" . db_placeholders($cids, 'int') . ") AND (uid = %d OR uid = 0) AND sid = %s", array_merge((array) $content_type, $cids, (array) $uid, (array) $sid));

Use array($content_type) instead of (array) $content_type casting here.

amateescu’s picture

Status: Needs work » Needs review
FileSize
5.41 KB

Adressed @sun's comments from #24.

sun’s picture

+++ b/flag.inc
@@ -1411,6 +1411,41 @@ class flag_comment extends flag_flag  {
+    $prefill_cache = FALSE;
+    if (!isset($this->cached_nids[$content->nid])) {
+      $prefill_cache = TRUE;
+      $this->cached_nids[$content->nid] = $content->nid;
+    }
+    $comment = ($prefill_cache) ? $content : NULL;

Would be more natural to have

$comment = NULL;
if (...
  $comment = $content;

instead of the $prefill_cache variable.

+++ b/flag.module
@@ -1545,6 +1555,49 @@ function flag_get_user_flags($content_type, $content_id = NULL, $uid = NULL, $si
+    $result = db_query("SELECT * FROM {flag_content} WHERE content_type = '%s' AND content_id IN (" . db_placeholders($cids, 'int') . ") AND (uid = %d OR uid = 0) AND sid = %s", array_merge(array($content_type), $cids, (array) $uid, (array) $sid));

Use array($uid, $sid) in the array_merge().

amateescu’s picture

Updated per #26. @sun, thanks for your thorough reviews!

Lars Toomre’s picture

I am at a bit loss by how this comment relates to the surrounding code... There is no static cache variable or call to the function flag_get_user_flags().

    // We should only pre-fill the static cache from flag_get_user_flags() once.

Am I missing something or even worse (LOL)?

amateescu’s picture

@Lars, take a look 7 lines below that comment :)

Lars Toomre’s picture

I would suggest then that the inline comment be moved down the seven lines then and its current location gets its own inline comment. It was not clear to me why we are setting $comment variable and adding to the list of cached nids only if !isset($this->cached_nids[$content->nid]).

There is an implicit assumption in the referenced block that $content->nid exists which appears will be hit if the default value of NULL is passed in for $content. I would think that a notice would be thrown in that event. Has this been tested with $content = NULL? My impression is that if statement should be isset($content->nid) && !isset($this->cached_nids[$content->nid]).

Edit: correcting spelling error.

amateescu’s picture

You are right about $content->nid, I assumed it's set because I was working with flag's default use case, which is located inside flag_link(), but if someone calls $flag->is_flagged() without calling $flag->get_content_id() first, then $content will be NULL.

Now that you've understood the whole process, can you suggest a better inline comment?

Lars Toomre’s picture

Status: Needs review » Needs work

@amateesscu - I have limited exposure to the flag module. I must confess I did not and still do not understand the whole process you are implementing. I will need to read further the whole modified files rather than just your patch file.

For instance, I am still confused by the use of $this->cached_nids[$content->nid] = $content->nid; It appears that we are building a list of node ids that we have processed. I do not understand why and will need to investigate further. Also in get_flagging_record(), I do not understand the use of the $comment variable. Do we only want to set $comment if $content->nid isset()? By default $content is already NULL.

I also note in flag_get_user_flags() that the new static variable array is indexed as $flagged_content[$uid][$sid][$content_type][$content_id][...]. For say node 456 with comment 789 by user 1, how will this static array be populated? It appears that it would be $flagged_content[1][0]['comment'][789]?? If so, this should be documented for clarity. Given the above, I am confused how one would find all of the flagged comments for node 456.

Further, looking at the new function _flag_preload_user_flags(), I see that $content_id is specified but never used, yet it appears to be an index into the array mentioned in the previous paragraph. This definitely needs to be corrected.

Also the $content variable is expected to be an object with member named nid. No other member or method of the object is referenced in this function. Why not simply use $nid for the last parameter and call the function with $content->nid?

Finally, I am puzzled why we are pulling flag data for uid 1 or uid 0 in my example from above. Does a logged in user inherit the flags of anonymous users because that is how global flags are stored?

I hope this helps. Later I will investigate the complete files further with your revised patch installed.

quicksketch’s picture

Hi guys, just letting everyone know that Flag has seen some positive movement the past couple of days and we have a new 2.0 beta release (beta6) that fixes a lot of current issues. I'd like to get these performance fixes in and this one seems pretty close. Do we just need further explanation (i.e. more comments) here, or are there further structure changes being recommended?

Of all the questions above, the only one I can answer right-off is this one:

Finally, I am puzzled why we are pulling flag data for uid 1 or uid 0 in my example from above. Does a logged in user inherit the flags of anonymous users because that is how global flags are stored?

Global flags do not record a UID at all, so UID (and SID) will be 0 when something has been flagged with a global flag. The OR statement here finds if the user has personally flagged an item or if the item has been flagged globally (possibly by a different user).

quicksketch’s picture

Title: Flag module query efficiency on large sites » Improve efficiency when displaying lists of flaggable comments

Seems like interest in this patch has floundered, mostly because of catch's comments in #13:

From the d.o point of view, there is no longer discussion about using this to flag comments.

Only to flag nodes, and only in the full node view (for issue subscriptions).

So for me, as long as flag links don't get shown in any lists of nodes (especially not issue queue tables which page at 50 items), then this patch wouldn't be necessary for deployment (although would need to keep an eye on stuff later).

So basically this patch is no longer needed for getting Flag on drupal.org, which is why most people were interested in it. Retitling issue to match the actual contents of the patch.

klonos’s picture

...summary probably needs an update too, but I just finished the summary in #1077878: Add HTML5shiv to core and I'm beat ;)

joachim’s picture

Version: 6.x-2.x-dev » 7.x-3.x-dev

Changing the version. As was remarked above, this should be much easier on D7.

YesCT’s picture

maybe people with experience on this could advise on #1621714: Allow to bookmark/favorite issues without abusing the Assigned field or issue tags regarding performance concerns.

joachim’s picture

I don't think this affects #1621714: Allow to bookmark/favorite issues without abusing the Assigned field or issue tags at all, as AFAIK, there's no plan to add a flag on comments on d.org.

joachim’s picture

There now IS a plan to add flag to comments on d.org: #2386751: Replace 'report spam' links with Flag functionality.

joachim’s picture

+++ b/flag.inc
@@ -1411,6 +1411,40 @@ class flag_comment extends flag_flag  {
+   * Receives an additional $content parameter and supplies this information to
+   * flag_get_user_flags() along with a cache prefill parameter.
+   */
+  function get_flagging_record($content_id, $uid = NULL, $sid = NULL, $content = NULL) {

Adding a param to a child method isn't going to work on newer versions of PHP, is it?

joachim’s picture

Assigned: Unassigned » joachim

I'm taking a look at this, since it's now become fairly important for d.org.

Youch, this patch has rotted! None of it applies on 7.x-3.x, mostly because for that branch we did a lot of s/content/entity.

The overall approach of this is good, but I'd like to improve the details of the implementation. In particular, the patch is adding a helper function for flag_get_user_flags(), _flag_preload_user_flags(), which has special handling for comments. To me it seems that this sort of thing is what we should be using our flag classes for: we already have a flag_comment class, so it makes sense to put special handling for flags on comments here.

This requires a slight shift in the order of things.

As I understand it, the previous patch gives us this:

- $flag->is_flagged() is called (by something like the flag link rendering) to see if the comment is flagged by the current user
- is_flagged calls get_flagging_record()
- get_flagging_record() calls out of the flag class to flag_get_user_flags(), which is an API function which has static caching. For comments, if the current comment is the first we've seen for the node it belongs to, the $comment entity is passed as a new parameter.
- flag_get_user_flags() picks up on the entity parameter, and calls the helper, _flag_preload_user_flags()
- _flag_preload_user_flags handles the case where the flagged entity type is comment, and queries for all the comments on the given comment's node, then loads all the flagging records for all of these, and returns then for flag_get_user_flags()'s static cache.

What I think we can do instead is this:

- $flag->is_flagged() is called (by something like the flag link rendering) to see if the comment is flagged by the current user
- is_flagged calls get_flagging_record()
- flag_comment's override of get_flagging_record(), if the current comment is the first we've seen for the node it belongs to, queries for all the comments on the given comment's node, then for each comment, calls flag_get_user_flags() to warm up the static cache.

That's potentially a lot of function calls to flag_get_user_flags(), so we could overload the $entity_id parameter there to allow an array of entity IDs as well as a single ID.

I'll work on a rough proof of concept, and think about whether there's a way to test we're hitting the database fewer times, and then we'll move this issue to the D8 branch.

joachim’s picture

> That's potentially a lot of function calls to flag_get_user_flags(), so we could overload the $entity_id parameter there to allow an array of entity IDs as well as a single ID.

It would help if I didn't talk total rubbish.... *sigh*

The bit we're trying to optimise is the query in flag_get_user_flags(), so we DEFINITELY have to either call that with all the IDs at once, or warm the cache up from flag_comment since on D7 we can access another function's static cache.

joachim’s picture

+++ b/flag.module
@@ -1545,6 +1555,49 @@ function flag_get_user_flags($content_type, $content_id = NULL, $uid = NULL, $si
+    $result = db_query("SELECT * FROM {flag_content} WHERE content_type = '%s' AND content_id IN (" . db_placeholders($cids, 'int') . ") AND (uid = %d OR uid = 0) AND sid = %s", array_merge(array($content_type), $cids, array($uid, $sid)));

Another problem I've found as I'm working on this is here -- this query gets nothing if a comment is not flagged at all.

That means that the cache won't get a key for this comment, and so when it's rendered, there is a cache miss, which incurs a query for the single comment.

For a use case like d.org, where the flag is 'report comment as spam', the majority of comments on an issue won't be flagged, so the cache warm-up won't do much, and we'll still be querying individually for most comments.

joachim’s picture

Status: Needs work » Needs review
FileSize
9.89 KB

Here's a 7.x-3.x patch, with tests.

Given d.org needs this, I reckon we can do things the wrong way round here and fix on 7 first, then forward-port to 8, especially as 8 will need some supporting changes first.

joachim’s picture

Issue tags: +7.x-3.7 blocker
Anonymous’s picture

Looks good to me - tested a few times with & without this patch using devel generate to create articles with comments, one node with 99 comments went from 108 down to 57 queries, which is nice!

I've attached a re-roll as the original patch was missing a new line at the end of the .info file for the new testing module.

joachim’s picture

Status: Needs review » Fixed

Brilliant! Thanks for being so thorough with this one :)

  • joachim committed 45a0f8d on 7.x-3.x
    Issue #1133956 by joachim, amateescu, stevepurkiss: Added special...

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.