The {flag_counts} table holds a count of flags on each entity, for each applicable flag. The purpose of this seems to be to speed up tasks such as showing how many users have flagged a node, or checking thresholds with Rules.

The flag_count table is updated whenever an entity is flagged or unflagged as part of the chain of code that is triggered by calling flag().

However, this approach has problems, because there are several places where flaggings are deleted in bulk, and the API is not invoked: #1930320: Flag count suddenly wrong (gap between flag_content & flag_counts).

Fixing this is problematic, because of the potential chain reaction and because it causes an operation whose initiator we know nothing about to get much more expensive (batch process? cron run? admin form submission? rules execution?).

One way round this would be simply to stop updating the flag_counts table all the time. Instead of treating it like a reliable copy of the data, treat it more like a cache, which is periodically flushed and regenerated.

AFAICT, the following query of the flagging table gives me effectively the content of the flag_counts table:

SELECT fid, entity_type, entity_id, count(uid), max(TIMESTAMP) FROM flagging GROUP BY fid, entity_id ORDER BY fid

The questions that remain are: when would this get repopulated, and what would be affected by getting a potentially stale count?

Comments

stevenx’s picture

It would be good if the flag could be only unflagged while deleting the user when the admin deletes the user
not when the user is put in inactive state. or at least keep up the counter.
this also shall be optional as an option per flag

thank god there is that table acting like a cache. this way i could recover the delete all flags bug when deleting a user ;) hehe

socketwench’s picture

My understanding is that we have the flag_counts table to make certain kinds of views easier. At least, that was the impression I got from reading the code.

Changing to caching behavior in 7.x would be a major refactor. It might be easier in 8.x depending on the caching implementation. Even so, I'd like to do this even if only in 8.x. Having a table for counts never sat well with me.

joachim’s picture

> Changing to caching behavior in 7.x would be a major refactor

Agreed, this is definitely for 8.

> It might be easier in 8.x depending on the caching implementation

I wasn't necessarily thinking of using Drupal core's cache API. Just treating the {flag_counts} table as a potentially unreliable cache, rather than something that always has to be kept in sync.

socketwench’s picture

Version: 7.x-3.x-dev » 8.x-4.x-dev
joachim’s picture

Now that the flag count system is a service, it would be relatively simple to swap it out and put in something that implements this.

socketwench’s picture

joachim’s picture

> My understanding is that we have the flag_counts table to make certain kinds of views easier. At least, that was the impression I got from reading the code.

Yup, this should remain as a table that Views can use.

socketwench’s picture

Do we even have views support for flag counts yet?

joachim’s picture

I don't know, but it's one of the major use cases.

joachim’s picture

Regarding whether to keep the existing system as an option: I honestly don't know whether this idea is better or worse for performance, or whether it depends on the number of flags and nodes on the site.

The current system updates counts whenever the flaggings change. If you don't care about counts, that's DB updates that you didn't need.

A cache system would in some ways not be that much of a change: when a flagging happens, you invalidate (aka delete) the row for the count on that entity instead of updating it. You update it later on when there's a page view that needs that count, which causes the count query to happen.

Looking at it that way, that's making 2 queries instead of 1:

- user performs flagging -- cache table row deleted
- user views node -- cache table row populated

That doesn't look like an improvement!

I think what I'd like to see most is the flag counts system wrapped up in a service, and a 2nd service that does nothing provided alongside it, so sites can turn it off. (Or a pair of plugins and have it in the UI -- how much of a hardcore dev-only option is this?)

And also, we should consider providing a 'Flush the whole counts table & repopulate it' function, in case counts do get out of sync. But hopefully on D8 our architecture is cleaner and that won't be needed :D

joachim’s picture

The more I think about this, the more I think that we either should keep the current system, or make it pluggable, because whether a count system is good or not depends on how the site is using it.

Suppose a site is set up to show the count in the link, e.g. 'Like this post (45 likes)'.

The current system goes like this:

- user loads page. Flag count is loaded from the count table.
- user clicks link. Flagging is created, flag counts updated.
- the link or the whole page reloads. Flag count is loaded from the count table.

If we change to a cache, then that chain of events now incurs extra queries:

- user loads page. Flag count is loaded from the count *cache* table.
- user clicks link. Flagging is created. Flag count cache item is invalidated.
- the link or the whole page reloads. Flag count is loaded from cache -- cache misses, flag count is calculated.

We've got an extra load for the cache miss, and then a GROUP or COUNT query to populate it.

Now suppose that we have a site where the flag count is only shown as a site-wide statistics, such as a view of top-liked posts.

Here we don't need to update the flag count each time something is flagged, because it might be that the stats view hasn't been loaded in the meantime:

- user 1 flags something. Cache table count for entity invalidated.
- user 2 flags something. Cache table count for entity invalidated.
- user 3 flags something. Cache table count for entity invalidated.
- user 4 looks at the statistics view -- NOW we hit the cache, miss, and recalculate counts.

Here we've optimized for the users who do the flagging, but at a cost to the user viewing the stats table. This would be desirable for a high-traffic site, and especially if the stats table were admin-only.

But having a cache table where entries get re-calculated after a cache miss isn't going to work with Views anyway, because by the time Views loads values, it's too late for the updated values to affect the sort order...

So for that example, we're looking at a cache that gets regenerated on cron rather than on demand. That can be either single items, or the whole table -- I'm not sure whether there are different use cases for that.

socketwench’s picture

The more I think about this, the more I think that we either should keep the current system, or make it pluggable, because whether a count system is good or not depends on how the site is using it.

We kinda have that already. A third party module could implement FlagCountManagerInterface and replace Flag's count manager with their own. That's why the count manager responds to events instead of us updating the table directly in the entity classes. The problem is that we don't have as good of isolation for that in some of our tests and in views itself.

So for that example, we're looking at a cache that gets regenerated on cron rather than on demand. That can be either single items, or the whole table -- I'm not sure whether there are different use cases for that.

That's where I was going too. Moving the count to a cache implies that it will not be 100% accurate all the time. That might be a bit annoying, but if you need something that accurate, you're probably doing an operation of flaggings directly anyways.

I was also thinking we would assign the cache bin to be per flag, with data in the bin for entity counts and total counts.

joachim’s picture

> Moving the count to a cache implies that it will not be 100% accurate all the time. That might be a bit annoying

For the use case with counts shown on links, it's more than annoying, it's broken.

Berdir’s picture

As @socketwench said, the current service is already pluggable, anyone can write their own implementation.

I think a flag_counts table is a good default approach, it's always correct but for some cases doesn't offer the best performance.

> The problem is that we don't have as good of isolation for that in some of our tests and in views itself.

Tests don't have to be isolated. They have to test the actual implementation. Views is more complicated, but the truth is that you simply can't write generic views that rely on services. Maybe for displaying the count, but not if you want to sort by it, for example. That would be impossible with a cache-based implementation (Unless you have an implementation that is implemented as a calculate-on-demand/on-cron but still uses the same table).

Either way, I'd vote to change this to a normal feature request to add alternative implementations. Maybe someone will write one and contribute it back.

socketwench’s picture

Status: Active » Closed (won't fix)

I think a flag_counts table is a good default approach, it's always correct but for some cases doesn't offer the best performance.

Agreed. Let's close this as a won't-fix. If a site needs better flag_counts performance, they can write a module to swap out the service implementation for one that uses the cache table instead.