Early Bird Registration for DrupalCon Portland 2024 is open! Register by 23:59 PST on 31 March 2024, to get $100 off your ticket.
I have a node with over 100k revisions and metatag_entity_load takes over 8 seconds to work. I believe this is because metatag_metatags_load_multiple will load up every revision_id.
Comment | File | Size | Author |
---|---|---|---|
#22 | metatag-beta7-optimize-loading-2183203-21.patch | 1.27 KB | juampynr |
#20 | metatag-beta7-optimize-loading-2183203-20.patch | 1.25 KB | juampynr |
#18 | metatag-optimize-loading-2183203-18.patch | 1.61 KB | juampynr |
#17 | metatag-n2183203-17.patch | 1.51 KB | DamienMcKenna |
Comments
Comment #1
mikeytown2 CreditAttribution: mikeytown2 commentedMore info on this issue here: https://groups.drupal.org/node/401538
Comment #2
DamienMcKennaHave you tried using EntityCache?
Comment #3
gielfeldt CreditAttribution: gielfeldt commented@DamienMcKenna Whatever happened to root-cause analysis? Besides, EntityCache suffers from a race-condition if you're using another cache backend than the Database.
Back on topic:
Some consumers want the metatags for all revisions for some reason.
We could at least handle the cases where revision_ids are provided:
(Warning: untested).
That should result in a much more efficient query.
And for the rest: Two consumers in the metatag module fetches metatags for all revisions.
1.) Metatags feeds processor
2.) Bulk delete/revert
#1 I'm not so sure why it fetches metatags for all revisions. Perhaps it could settle for fetching only the current revision?
#2 shouldn't be a problem. This is an administrative batch operation, and IMO does not require real-time priority.
Comment #4
gielfeldt CreditAttribution: gielfeldt commentedAnother thing to note is that PDO and MySQL are not very good friends. The quoting of the entity_ids in the query causes mysql to do a filesort (at least on my setup). But I've seen this many times before though. Nothing to do about it, short of bypassing PDO's placeholder mechanism and write the query manually.
Comment #5
mikeytown2 CreditAttribution: mikeytown2 commentedLooking into this more and I think the issue could be worse than first reported (as you need a lot less revisions for the slowdown to occur). The node in question on our setup is a test one. We make sure things are working as they should with our non-drupal api by changing a value on drupal and saving it. This happens every 5 minutes. Our slowdown was gradual and looking at the metatag table there are only a little over 2k entries in it for this node. This corresponds with us pushing out the new version of metatag (beta 9) on the 20th (near the start of week 04).
So rather than needing 100k revisions for this slowdown to be noticeable, I would bet that it would be noticeable after several hundred revisions. This bug only affects new revisions so it makes since that an automated test would be the first one to pick this up. I'll add in the condition to the query as this looks like it should dramatically speed this up.
On an unrelated note we do recognize that saving 100k revisions is not needed. But this is still a valid bug as everything else runs hook_*_load in the millisecond range.
As for using entity cache we can't do that in our case as we merge in some of the field data from the nondrupal API via bf_field_mapper (the API is the master of the data in our case, not the drupal database).
Edit: The test runs every 5 minutes = 12 per hour = 288 per day = 2016 per week.
Comment #6
mikeytown2 CreditAttribution: mikeytown2 commentedI can confirm the filesort issue when the nid is quoted; although the speed hit is minimal as this is only a couple thousand rows. I commented out the revision_id to test this.
Attached is the patch to switch this over to use $revision_ids if it is given.
Comment #7
DamienMcKennaHave you tried out your patch with the 100k records? I'm a little surprised this is faster, given DBTNG is supposed to be slower than a straight up query.
Comment #8
mikeytown2 CreditAttribution: mikeytown2 commentedI did. Here is the output
Main reason it's faster is that the query is only returning 1 row vs over 2k. Adding an additional condition to a db_select is a lot cleaner looking than adding an extra AND to a db_query. You could use db_query if you really wanted to and get the same speed up; just be sure to add in the additional logic with
!empty($revision_ids)
Comment #9
DamienMcKennaTag.
Comment #10
gielfeldt CreditAttribution: gielfeldt commented@mikeytown2 In the patch you also included my bad casing of the second orderby() :-)
Comment #11
mikeytown2 CreditAttribution: mikeytown2 commentedFound another issue related to metatag_metatags_load_multiple and lots of revisions.
#2184857: PHP Fatal error: Allowed memory size of 12,582,912,000 bytes exhausted (tried to allocate 64 bytes) in metatag/metatag.module on line 388
Comment #12
juampynr CreditAttribution: juampynr commentedPart of the patch at https://drupal.org/comment/8310313#comment-8310313 already addresses this issue. Even though you call metatag_metatags_load_multiple() with a huge list of nids and revision_ids, it will perform in milliseconds. This is already part of metatag-7.x-1.0-beta9.
I think we can close this issue as a won't fix but I will let @DamienMcKenna to decide.
Comment #13
mikeytown2 CreditAttribution: mikeytown2 commented@juampy
This issue is still present. Create over 1k revisions on a single node to repo. This issue showed up after 7.x-1.0-beta9.
Comment #14
DamienMcKennaClearly some improvements are needed on the revision handling, and possible some API changes to accommodate the need.. so that makes it a blocker for 1.0-rc1.
Comment #15
mikeytown2 CreditAttribution: mikeytown2 commented@DamienMcKenna
What is wrong with the patch in #6? It's what we are using to fix this issue.
Comment #16
gielfeldt CreditAttribution: gielfeldt commentedI'm also curious. What improvements are "clearly" needed?
Comment #17
DamienMcKennaWell, there's an erroneous IF statement, for one ;-)
Comment #18
juampynr CreditAttribution: juampynr commentedThe last patch will still be very heavy for a large amount of nodes being loaded since it has two where conditions.
Here is an example. Given the following code:
If you run it with
drush scr test.php
, it will call to metatag_entity_load(array_of_500_nids), which finally callsmetatag_metatags_load_multiple('node', array_of_500_nids, array_of_500_rids). This query will take a few seconds to run.
While debugging this with @q0rban, he suggested to simply filter by revision_ids when they are present and, if they aren't, filter by nids. This makes the query to run in just a few milliseconds.
Comment #19
DamienMcKenna@juampy: Thanks, I'd been thinking of the same thing. I'm going to test it a little more before committing, but it seems pretty good so far.
Comment #20
juampynr CreditAttribution: juampynr commentedEdited. Wrong patch. See comment #22.
Comment #22
juampynr CreditAttribution: juampynr commentedJust for the record: for sites using metatag-7.x-1.0-beta7 which are having performance issues when loading metatags.
Here is the adapted patch to address it.
Comment #23
MXT@juampy for #22 Sorry I don't understand: metatag module is in beta 9 (that I'm using with BIG performance issues) and you have provided a patch for beta 7 ?
Is your patch already applied to beta 9 ? Or should I have to apply it in any case
Thank you for clarifications
Comment #24
juampynr CreditAttribution: juampynr commented@MTX, you should try patch at comment #18.
Comment #25
juampynr CreditAttribution: juampynr commentedSetting back to needs review since the patch that applies on HEAD is the one at comment #18.
Comment #27
DamienMcKennaI've committed patch #18, thank you all for your help with this.
Comment #30
davidwbarratt CreditAttribution: davidwbarratt commentedgielfeldt,
Is there an open issue for this? I couldn't find one:
https://www.drupal.org/project/issues/entitycache
Comment #31
gielfeldt CreditAttribution: gielfeldt commentedActually EntityCache isn't the culprit as such. It's more of a core issue:
#1941208: Don't call DrupalDefaultEntityController::resetCache() inside a transaction
#1679344: Race condition in node_save() when not using DB for cache_field
Comment #32
drummA followup to this issue is at #2835614: metatag_metatags_load_multiple() shouldn’t need to sort. I don’t see how the orderBy is being used, it might be able to be removed.