Problem/Motivation
Generating the sitemap empties the persistent entity cache.
Steps to reproduce
Proposed resolution
The reason that $storage->resetCache([$data_set['id']]);
is called when generating the site map is because we want to ensure that the entities don't take up masses of memory due to the static cache. This is very reasonable. However calling this code also results in the persistent cache being cleared. This results in the unexpected result of generating a sitemap clearing the entity cache.
The module needs to disable the static caching of entities during sitemap generation rather than calling resetCache(). Or it could clear the entity static cache directly.
Remaining tasks
User interface changes
API changes
Data model changes
Issue fork simple_sitemap-3202233
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #2
gbyte CreditAttribution: gbyte as a volunteer and at gbyte commentedAs you mentioned, we added this to avoid memory buildup and exhaustion. See #3170261: Entity static cache can cause memory exhaustion during sitemap regeneration.
I'm happy to review patches to improve this.
Comment #3
alexpottRealised that related entities such as paragraphs might end up getting stuck in the static cache if they are loaded when we load the entity.
@gbyte just found #3170261: Entity static cache can cause memory exhaustion during sitemap regeneration and saw that the original solution there was pretty much what I'd recommend. Going to upload something very similar to #3170261-2: Entity static cache can cause memory exhaustion during sitemap regeneration. Please give @mstef issue credit too.
Comment #5
alexpottHopefully at some point in the future one of:
will land and then \Drupal\simple_sitemap\Plugin\simple_sitemap\UrlGenerator\EntityUrlGenerator::generate() can be removed because core will handle it for you.
Created a MR to clear the entity memory cache after generating the url instead of resetting the entity cache. This will result in sitemap generation doing less queries to the db cache backend because it's stop clearing it :D
Comment #6
alexpottI've generated 5000 nodes with paragraph fields and set the content type to be indexed see - https://gist.github.com/alexpott/874a0480b7097d7d97322c053dec6634
I'm generating the sitemap doing
vendor/bin/drush ssg -v
The results show that subsequent runs are quicker because sitemap generation is no longer obliterating the entity cache.
With patch
Run 1
Run 2
Without the patch
Run 1
Run 2
Comment #7
alexpottThe reason for the speed up in subsequent runs is because it does way way less queries. With the patch subsequent runs is do around 25000 queries and without the patch we're doing around 50000 queries!
Comment #8
alexpottI've created #3202290: Add performance test script to make performance testing simpler in the future. It feels worth it because this module often has to deal with huge amounts of data and small changes can have a very big impact.
Comment #9
gbyte CreditAttribution: gbyte as a volunteer and at gbyte commentedComment #10
alexpottI've answered the review question. I think there's a misunderstanding of what deleteAll() is clearing. It is only clearing the entities from the memory used by the current process.
Comment #12
gbyte CreditAttribution: gbyte as a volunteer and at gbyte commentedThanks for the contribution & clarification.