Improve efficiency when displaying lists of flaggable comments [#1133956]

Background:

We had an initiative to add Flag to drupal.org, in order to flag issue comments as "This is an issue summary", and highlight them, so that if you had an issue with 100+ comments, you could immediately get to understanding of what the issue is. You can read all about this on:
#1036132: Provide a mechanism for issue summaries

That issue got postponed for two reasons:
1) The Prairie Initiative is thinking in broader terms about what issue tracking should look like and
2) The solution I implemented on my test d.o site added considerably to the query load when you were on an issue page. See comments #43, #50, and #52 on #1036132: Provide a mechanism for issue summaries for details. ==> This is the subject of the feature request here.

Feature Request:

When you are on a node page with comments, and you have a comment flag set up, Flag is doing one query per comment when displaying the comment, to figure out whether to add a "flag this" or "unflag this" link. Then in this case, there was a second query per comment done at display time to decide whether it had been flagged as "this is an issue summary", and change the theming accordingly.

The feature request would be to do all of this in one query per node (find the flag status of all the node's comments in one query) and cache it. At a minimum, it shouldn't be doing the same query twice for each comment -- once the query has been done once, the result should be cached so when I ask again if that comment has been flagged, it can just return the answer from memory.

The thought would be to add a function like flag_preload_user_flags($nid) that would do one query to load the flag status of all the comments in one fell swoop. Then flag_get_user_flags() would look at this cache to see if it already has information rather than doing a separate query for each comment.

Hopefully that makes sense...

Comment	File	Size	Author
#47	interdiff-1133956-47-44.txt	1.36 KB	Anonymous (not verified)
#47	1133956-47.flag_.efficiency-multiple-comments.patch	9.86 KB	Anonymous (not verified)
#45	1133956-44.flag_.efficiency-multiple-comments.patch	9.89 KB	joachim
#27	1133956-flag_query_efficiency-6.patch	5.35 KB	amateescu
#25	1133956-flag_query_efficiency-5.patch	5.41 KB	amateescu
#21	1133956-flag_query_efficiency-4.patch	5.02 KB	amateescu
#19	1133956-flag_query_efficiency-3.patch	4.97 KB	amateescu
#18	1133956-flag_query_efficiency-2.patch	4.86 KB	amateescu
#16	1133956-flag_query_efficiency.patch	4.11 KB	amateescu

Comments

Comment #1

dww

we/he/they

commented 21 April 2011 at 02:38

We'd love flag on d.o for other reasons, too -- primarily for #34496: [meta] Add Flag module to allow users to subscribe/unsubscribe without posting a comment. So, any love this issue can get would be most appreciated! I don't have time/energy right now to dive in and fix this myself, but I'll try to help review and otherwise support this issue whenever possible.

Comment #2

lars toomre commented 21 April 2011 at 06:21

I am subscribing to this issue in the hope that the flag module can be used on d.o to implement #34496: [meta] Add Flag module to allow users to subscribe/unsubscribe without posting a comment. I would love to see all of the subscribe comments eliminated from the issue nodes!!

Comment #3

klonos

he/him

English

90% Melbourne, Australia - 10% Larissa, Greece

commented 21 April 2011 at 10:32

Subscribing so I know where we stand on all related issues and perhaps I can "ping" people that postponed #1036132: Provide a mechanism for issue summaries once enough(?) time was given to the Prairie Initiative or if it fails to come to a conclusion and is eventually abandoned/postponed itself.

We shouldn't be keeping solid solutions for long-standing problems from being implemented simply because there *might* be a better way. That is a thing that will always happen anyways since perfection is unreachable. IMHO if it works, we should simply implement it and then we can improve it as time permits. I do understand that there is some performance impact (I don't know the size of it though) related with the solution, but I believe we need to sacrifice some things in order to gain.

Anyways, since most of the "brains" behind both the solution and the discussion are really busy people, I'll be keeping an eye on it and "pocking" people to move on. I promise to allow enough(?) time before I start "annoying" people.

PS: sorry for the lengthy and perhaps off-topic comment. I didn't simply want to "subscribe" ;)

Comment #4

dww

we/he/they

commented 21 April 2011 at 15:38

@klonos: yes, very off topic. ;) your basically justified rant belongs in the other issue. This was meant to be a place to actually accomplish a (hopefully) small but important optimization in flag that will allow us to safely deploy it for a variety of use-cases on d.o.

I pray that everyone who cares about #1036132 and #34496 does not follow your lead and "subscribe" here or otherwise try to discuss the d.o use cases. That's just going to slow down progress here.

Thanks!
-Derek

Comment #5

rfay

he,him

English

Palisade, CO, USA

commented 13 May 2011 at 19:49

Following this and looking forward to a better way to do so.

Comment #6

rfay

he,him

English

Palisade, CO, USA

commented 22 May 2011 at 14:52

As I think about this, I'm *very* happy to have flag perform better.

BUT: The functionality of drupal.org is more important than even performance. It behooves us to get content differentiation widely deployed on drupal.org. My copyrighted phrase of the day is "Drupal.org is a cesspool of undifferentiated crap." It is. Fixing it is even more important than performance.

Comment #7

catch

he/him

English

commented 22 May 2011 at 14:52

Subscribing, there's a chance I'd be able to look at the patch although I can't promise anything, however if I do I'll post back here to say I'm working on it.

Comment #8

geerlingguy commented 22 May 2011 at 20:21

I'm subscribing for unrelated reasons; I'd like a project I'm working on to perform better as it scales out (right now, it's not a big issue, but that's going to change in the next 3-6 months, and I might be able to devote some time to this patch).

Comment #9

dww

we/he/they

commented 23 May 2011 at 06:41

@rfay: I applaude your enthusiasm and initiative. I've long referred to large swaths of d.o as "a swamp". However, "Fixing it is even more important than performance." -- I wish that were true. We can't just start deploying code on d.o until the servers melt, the site goes offline, and everyone's pagers are going off. The clear road ahead is to deal with the performance issues before they take down the site, not while it's happening. jhodgdon had a very clear plan, and it makes sense to me. We just need to see it coded into something that can be reviewed, committed, pushed, and then deployed on d.o.

Thanks
-Derek

Comment #10

gapple

he/they

English

commented 4 June 2011 at 22:32

Just wondering about the status of this issue, since #1036132: Provide a mechanism for issue summaries seems to have taken a different path from using Flag module. What would the impact of implementing precaching be on #34496: [meta] Add Flag module to allow users to subscribe/unsubscribe without posting a comment?

Precaching comment flags seems pretty straightforward, though nodes with a lot of comments on many pages could foil the benefits.
Are there examples of any instances where precaching Node or User flags would be beneficial?
Would it be possible to batch the retrieval of flag information in some instances?

Comment #11

pillarsdotnet commented 28 June 2011 at 20:12

Title:

Query efficiency

» Flag module query efficiency on large sites

Comment #12

killes@www.drop.org commented 29 June 2011 at 23:31

subscribe.

#1204474: Less queries was marked a duplicate of this, it has some thoughts on what should IMO be done:

've declined this in the current state of the module. It adds IMO too many queries.
(imagine a long issue like the one linked to above).

I'd like to see the static cache in flag_get_user_flags be made more configurable so that we could get all the flags from some other cache (memached for example) instead of from the DB.
Or, as an alternative, some hook that gets called before the view hook could be used to pre-fill the static cache.

Comment #13

catch

he/him

English

commented 7 July 2011 at 16:07

From the d.o point of view, there is no longer discussion about using this to flag comments.

Only to flag nodes, and only in the full node view (for issue subscriptions).

So for me, as long as flag links don't get shown in any lists of nodes (especially not issue queue tables which page at 50 items), then this patch wouldn't be necessary for deployment (although would need to keep an eye on stuff later).

Also in D7 this would be many, many times easier to fix because there is hook_entity_prepare_view() that lets you load user-specific information for lists of entities. So as long as we don't need flag links on lists of entities before d.o upgrades to D7, I would probably bump the version (again, at least for d.o. usage, this might still be a good D6 patch if we could figure out how to do it, but with no multiple load in D6 it seems tricky).

Comment #14

jhodgdon

she/her

English

commented 7 July 2011 at 18:43

I think actually that if you have flag turned on, it will do those queries for every comment anyway? Not sure, should check the code and/or test it.

Comment #15

amateescu commented 9 July 2011 at 12:42

@jhodgdon, I just checked, those queries are not run on a Node flag type.

There are only two flag queries shown in Devel's query log on a flag-enabled node:

0.35 1 flag_get_flags      SELECT f.*, fn.type FROM flags f LEFT JOIN flag_types fn ON fn.fid = f.fid
0.36 1 flag_get_user_flags SELECT * FROM flag_content WHERE content_type = 'node' AND content_id = 13 AND (uid = 1 OR uid = 0) AND sid = 0

Comment #16

amateescu commented 10 July 2011 at 14:07

Status	File	Size
new	1133956-flag_query_efficiency.patch	4.11 KB

I promised @killes that I will try to come up with a patch for this issue, so here it is :) Kudos to dereine for clicking the idea on how to handle this.

On a node with 163 comments we get from:

Executed 395 queries in 127.55 milliseconds.

to:

Executed 234 queries in 93.12 milliseconds.

The doxygen comments could use some better wording, but I hope the general approach is sane enough.

Comment #17

amateescu commented 10 July 2011 at 14:08

Status:

Active

» Needs review

Of course, CNR.

Comment #18

amateescu commented 10 July 2011 at 14:38

Status	File	Size
new	1133956-flag_query_efficiency-2.patch	4.86 KB

From IRC: @dereine thinks that flag_preload_user_flags() looks more like a private function, and @xjm noticed that I forgot about doxygen docs.

Comment #19

amateescu commented 10 July 2011 at 17:24

Status	File	Size
new	1133956-flag_query_efficiency-3.patch	4.97 KB

Fixed a problem with merging $flagged_content arrays.

Comment #20

gerhard killesreiter commented 10 July 2011 at 17:31

I've tested this locally and it nixes most of the queries.

Comment #21

amateescu commented 10 July 2011 at 18:37

Status	File	Size
new	1133956-flag_query_efficiency-4.patch	5.02 KB

@killes suggested that we should use db_placeholders() in _flag_preload_user_flags().

Comment #22

catch

he/him

English

commented 11 July 2011 at 00:58

This might run into issues on very long comment threads since it's pulling everything rather than just those on the current page. Would probably still be an improvement up to 1000 odd depending on comments per page.

Comment #23

quicksketch

he/him

commented 11 July 2011 at 02:23

This looks pretty good overall. I find it a little bit strange that it's implemented specifically for comments, but I see why such a thing would exist instead of making it generic (you can't really do the same thing with nodes, unless you knew how they were being listed). I haven't been working much on Flag recently, so strong, well-reviewed patches will get this in.

Also related to performance, this patch would reduce a huge number of COUNT() queries, which right now we do on every flagging: #719616: Keep flag_counts current by manual in/decreasing of count.

Comment #24

sun

German

Karlsruhe

commented 29 July 2011 at 20:11

Status:

Needs review

» Needs work

+++ b/flag.inc
@@ -1411,6 +1411,28 @@ class flag_comment extends flag_flag  {
+  function is_flagged($content_id, $uid = NULL, $sid = NULL) {
...
+  function get_flagging_record($content_id, $uid = NULL, $sid = NULL, $content = NULL) {

Missing phpDoc "Overrides flag_flag::[method]()."

The phpDoc description should clarify why these methods are overridden and what they are doing differently.

+++ b/flag.module
@@ -1496,6 +1496,10 @@ function flag_get_content_id($fcid) {
+ * @param $content
+ *   Optional. The content object that corresponds to $content_id.
+ * @param $prefill
+ *   Optional. Pre-fill the static cache for the given $content.

I don't see a use-case nor functional logic that would support $content without $prefill == TRUE...? So why does $prefill exist?

+++ b/flag.module
@@ -1545,6 +1556,49 @@ function flag_get_user_flags($content_type, $content_id = NULL, $uid = NULL, $si
+    $result = db_query("SELECT * FROM {flag_content} WHERE content_type = '%s' AND content_id IN (" . db_placeholders($cids, 'int') . ") AND (uid = %d OR uid = 0) AND sid = %s", array_merge((array) $content_type, $cids, (array) $uid, (array) $sid));

Use array($content_type) instead of (array) $content_type casting here.

Comment #25

amateescu commented 6 August 2011 at 16:07

Status:

Needs work

» Needs review

Status	File	Size
new	1133956-flag_query_efficiency-5.patch	5.41 KB

Adressed @sun's comments from #24.

Comment #26

sun

German

Karlsruhe

commented 7 August 2011 at 19:10

+++ b/flag.inc
@@ -1411,6 +1411,41 @@ class flag_comment extends flag_flag  {
+    $prefill_cache = FALSE;
+    if (!isset($this->cached_nids[$content->nid])) {
+      $prefill_cache = TRUE;
+      $this->cached_nids[$content->nid] = $content->nid;
+    }
+    $comment = ($prefill_cache) ? $content : NULL;

Would be more natural to have

$comment = NULL;
if (...
  $comment = $content;

instead of the $prefill_cache variable.

+++ b/flag.module
@@ -1545,6 +1555,49 @@ function flag_get_user_flags($content_type, $content_id = NULL, $uid = NULL, $si
+    $result = db_query("SELECT * FROM {flag_content} WHERE content_type = '%s' AND content_id IN (" . db_placeholders($cids, 'int') . ") AND (uid = %d OR uid = 0) AND sid = %s", array_merge(array($content_type), $cids, (array) $uid, (array) $sid));

Use array($uid, $sid) in the array_merge().

Comment #27

amateescu commented 13 August 2011 at 06:51

Status	File	Size
new	1133956-flag_query_efficiency-6.patch	5.35 KB

Updated per #26. @sun, thanks for your thorough reviews!

Comment #28

lars toomre commented 13 August 2011 at 10:23

I am at a bit loss by how this comment relates to the surrounding code... There is no static cache variable or call to the function flag_get_user_flags().

    // We should only pre-fill the static cache from flag_get_user_flags() once.

Am I missing something or even worse (LOL)?

Comment #29

amateescu commented 13 August 2011 at 11:37

@Lars, take a look 7 lines below that comment :)

Comment #30

lars toomre commented 13 August 2011 at 15:06

I would suggest then that the inline comment be moved down the seven lines then and its current location gets its own inline comment. It was not clear to me why we are setting $comment variable and adding to the list of cached nids only if !isset($this->cached_nids[$content->nid]).

There is an implicit assumption in the referenced block that $content->nid exists which appears will be hit if the default value of NULL is passed in for $content. I would think that a notice would be thrown in that event. Has this been tested with $content = NULL? My impression is that if statement should be isset($content->nid) && !isset($this->cached_nids[$content->nid]).

Edit: correcting spelling error.

Comment #31

amateescu commented 13 August 2011 at 16:57

You are right about $content->nid, I assumed it's set because I was working with flag's default use case, which is located inside flag_link(), but if someone calls $flag->is_flagged() without calling $flag->get_content_id() first, then $content will be NULL.

Now that you've understood the whole process, can you suggest a better inline comment?

Comment #32

lars toomre commented 13 August 2011 at 19:58

Status:

Needs review

» Needs work

@amateesscu - I have limited exposure to the flag module. I must confess I did not and still do not understand the whole process you are implementing. I will need to read further the whole modified files rather than just your patch file.

For instance, I am still confused by the use of $this->cached_nids[$content->nid] = $content->nid; It appears that we are building a list of node ids that we have processed. I do not understand why and will need to investigate further. Also in get_flagging_record(), I do not understand the use of the $comment variable. Do we only want to set $comment if $content->nid isset()? By default $content is already NULL.

I also note in flag_get_user_flags() that the new static variable array is indexed as $flagged_content[$uid][$sid][$content_type][$content_id][...]. For say node 456 with comment 789 by user 1, how will this static array be populated? It appears that it would be $flagged_content[1][0]['comment'][789]?? If so, this should be documented for clarity. Given the above, I am confused how one would find all of the flagged comments for node 456.

Further, looking at the new function _flag_preload_user_flags(), I see that $content_id is specified but never used, yet it appears to be an index into the array mentioned in the previous paragraph. This definitely needs to be corrected.

Also the $content variable is expected to be an object with member named nid. No other member or method of the object is referenced in this function. Why not simply use $nid for the last parameter and call the function with $content->nid?

Finally, I am puzzled why we are pulling flag data for uid 1 or uid 0 in my example from above. Does a logged in user inherit the flags of anonymous users because that is how global flags are stored?

I hope this helps. Later I will investigate the complete files further with your revised patch installed.

Comment #33

quicksketch

he/him

commented 8 September 2011 at 22:48

Hi guys, just letting everyone know that Flag has seen some positive movement the past couple of days and we have a new 2.0 beta release (beta6) that fixes a lot of current issues. I'd like to get these performance fixes in and this one seems pretty close. Do we just need further explanation (i.e. more comments) here, or are there further structure changes being recommended?

Of all the questions above, the only one I can answer right-off is this one:

Finally, I am puzzled why we are pulling flag data for uid 1 or uid 0 in my example from above. Does a logged in user inherit the flags of anonymous users because that is how global flags are stored?

Global flags do not record a UID at all, so UID (and SID) will be 0 when something has been flagged with a global flag. The OR statement here finds if the user has personally flagged an item or if the item has been flagged globally (possibly by a different user).

Comment #34

quicksketch

he/him

commented 27 September 2011 at 00:35

Title:

Flag module query efficiency on large sites

» Improve efficiency when displaying lists of flaggable comments

Seems like interest in this patch has floundered, mostly because of catch's comments in #13:

From the d.o point of view, there is no longer discussion about using this to flag comments.

Only to flag nodes, and only in the full node view (for issue subscriptions).

So for me, as long as flag links don't get shown in any lists of nodes (especially not issue queue tables which page at 50 items), then this patch wouldn't be necessary for deployment (although would need to keep an eye on stuff later).

So basically this patch is no longer needed for getting Flag on drupal.org, which is why most people were interested in it. Retitling issue to match the actual contents of the patch.

Comment #35

klonos

he/him

English

90% Melbourne, Australia - 10% Larissa, Greece

commented 27 September 2011 at 12:16

...summary probably needs an update too, but I just finished the summary in #1077878: Add HTML5shiv to core and I'm beat ;)

Comment #37

joachim commented 28 July 2012 at 14:28

Version:

6.x-2.x-dev

» 7.x-3.x-dev

Changing the version. As was remarked above, this should be much easier on D7.

Comment #38

yesct commented 3 September 2014 at 20:33

maybe people with experience on this could advise on #1621714: Allow to bookmark/favorite issues without abusing the Assigned field or issue tags regarding performance concerns.

Comment #39

joachim commented 3 September 2014 at 21:32

I don't think this affects #1621714: Allow to bookmark/favorite issues without abusing the Assigned field or issue tags at all, as AFAIK, there's no plan to add a flag on comments on d.org.

Comment #40

joachim commented 19 January 2015 at 08:41

There now IS a plan to add flag to comments on d.org: #2386751: Replace 'report spam' links with Flag functionality.

Comment #41

joachim commented 27 January 2015 at 16:10

+++ b/flag.inc
@@ -1411,6 +1411,40 @@ class flag_comment extends flag_flag  {
+   * Receives an additional $content parameter and supplies this information to
+   * flag_get_user_flags() along with a cache prefill parameter.
+   */
+  function get_flagging_record($content_id, $uid = NULL, $sid = NULL, $content = NULL) {

Adding a param to a child method isn't going to work on newer versions of PHP, is it?

Comment #42

joachim commented 7 May 2015 at 21:58

Assigned:

Unassigned

» joachim

I'm taking a look at this, since it's now become fairly important for d.org.

Youch, this patch has rotted! None of it applies on 7.x-3.x, mostly because for that branch we did a lot of s/content/entity.

The overall approach of this is good, but I'd like to improve the details of the implementation. In particular, the patch is adding a helper function for flag_get_user_flags(), _flag_preload_user_flags(), which has special handling for comments. To me it seems that this sort of thing is what we should be using our flag classes for: we already have a flag_comment class, so it makes sense to put special handling for flags on comments here.

This requires a slight shift in the order of things.

As I understand it, the previous patch gives us this:

- $flag->is_flagged() is called (by something like the flag link rendering) to see if the comment is flagged by the current user
- is_flagged calls get_flagging_record()
- get_flagging_record() calls out of the flag class to flag_get_user_flags(), which is an API function which has static caching. For comments, if the current comment is the first we've seen for the node it belongs to, the $comment entity is passed as a new parameter.
- flag_get_user_flags() picks up on the entity parameter, and calls the helper, _flag_preload_user_flags()
- _flag_preload_user_flags handles the case where the flagged entity type is comment, and queries for all the comments on the given comment's node, then loads all the flagging records for all of these, and returns then for flag_get_user_flags()'s static cache.

What I think we can do instead is this:

- $flag->is_flagged() is called (by something like the flag link rendering) to see if the comment is flagged by the current user
- is_flagged calls get_flagging_record()
- flag_comment's override of get_flagging_record(), if the current comment is the first we've seen for the node it belongs to, queries for all the comments on the given comment's node, then for each comment, calls flag_get_user_flags() to warm up the static cache.

That's potentially a lot of function calls to flag_get_user_flags(), so we could overload the $entity_id parameter there to allow an array of entity IDs as well as a single ID.

I'll work on a rough proof of concept, and think about whether there's a way to test we're hitting the database fewer times, and then we'll move this issue to the D8 branch.

Comment #43

joachim commented 10 May 2015 at 06:58

> That's potentially a lot of function calls to flag_get_user_flags(), so we could overload the $entity_id parameter there to allow an array of entity IDs as well as a single ID.

It would help if I didn't talk total rubbish.... *sigh*

The bit we're trying to optimise is the query in flag_get_user_flags(), so we DEFINITELY have to either call that with all the IDs at once, or warm the cache up from flag_comment since on D7 we can access another function's static cache.

Comment #44

joachim commented 11 May 2015 at 06:25

+++ b/flag.module
@@ -1545,6 +1555,49 @@ function flag_get_user_flags($content_type, $content_id = NULL, $uid = NULL, $si
+    $result = db_query("SELECT * FROM {flag_content} WHERE content_type = '%s' AND content_id IN (" . db_placeholders($cids, 'int') . ") AND (uid = %d OR uid = 0) AND sid = %s", array_merge(array($content_type), $cids, array($uid, $sid)));

Another problem I've found as I'm working on this is here -- this query gets nothing if a comment is not flagged at all.

That means that the cache won't get a key for this comment, and so when it's rendered, there is a cache miss, which incurs a query for the single comment.

For a use case like d.org, where the flag is 'report comment as spam', the majority of comments on an issue won't be flagged, so the cache warm-up won't do much, and we'll still be querying individually for most comments.

Comment #45

joachim commented 12 May 2015 at 19:13

Status:

Needs work

» Needs review

Status	File	Size
new	1133956-44.flag_.efficiency-multiple-comments.patch	9.89 KB

Here's a 7.x-3.x patch, with tests.

Given d.org needs this, I reckon we can do things the wrong way round here and fix on 7 first, then forward-port to 8, especially as 8 will need some supporting changes first.

Comment #46

joachim commented 15 May 2015 at 10:01

Issue tags:

+7.x-3.7 blocker

Comment #47

Anonymous (not verified) commented 19 September 2015 at 15:02

Status	File	Size
new	1133956-47.flag_.efficiency-multiple-comments.patch	9.86 KB
new	interdiff-1133956-47-44.txt	1.36 KB

Looks good to me - tested a few times with & without this patch using devel generate to create articles with comments, one node with 99 comments went from 108 down to 57 queries, which is nice!

I've attached a re-roll as the original patch was missing a new line at the end of the .info file for the new testing module.

Comment #48

joachim commented 20 September 2015 at 10:29

Status:

Needs review

» Fixed

Brilliant! Thanks for being so thorough with this one :)

Comment #49

20 September 2015 at 10:29

joachim committed 45a0f8d on 7.x-3.x

Issue #1133956 by joachim, amateescu, stevepurkiss: Added special...

Comment #50

4 October 2015 at 10:34

Status:

Fixed

» Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

Improve efficiency when displaying lists of flaggable comments

Comments

Referenced by