Problem/Motivation
On real sites with 30000+ nodes to index there are problems with Drupal's database queue. The \Drupal\simple_sitemap\Queue\SimplesitemapQueue::claimItem() query:
$item = $this->connection->queryRange('SELECT data, item_id FROM {queue} q WHERE name = :name ORDER BY item_id ASC', 0, 1, [':name' => $this->name])->fetchObject();
if ($item) {
$item->data = unserialize($item->data);
return $item;
}
becomes a bit of a problem. I think it's because even though we have the limit on the query we have to run this for each claim and we're then deleting the item we claimed. This combines to mess up database query optimisations and results in the query taking more time than you'd like.
Proposed resolution
Given the simple_sitemap already has an optimised queue class - \Drupal\simple_sitemap\Queue\SimplesitemapQueue which does not actually take a lease out on the item we can go a step further and not limit the query - and use a generator to return the rows from an unlimited query.
Also given the we're not updating the lease time I think we should consider setting up a persistent lock around sitemap generation so multiple processes don't try and generate the same rows.
Remaining tasks
User interface changes
API changes
Data model changes
Comment | File | Size | Author |
---|---|---|---|
#5 | Screenshot 2021-03-15 at 21.27.46.png | 303.67 KB | alexpott |
#4 | Screenshot 2021-03-15 at 13.42.03.png | 127.09 KB | alexpott |
#4 | Screenshot 2021-03-15 at 13.42.52.png | 363.3 KB | alexpott |
Issue fork simple_sitemap-3203626
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #3
alexpottUsing #3202290: Add performance test script to measure performance on a site with 10000 nodes...
With the patch
Without the patch
As we can see there are 100103 queries before this patch and only 89822 after. The represents a considerable drop in db traffic.
Comment #4
alexpottHere's some pictures from a production site suffering every hour while generating the sitemap. The majority of queries are the queue queries from the sitemap module...
Comment #5
alexpottWe've tried this fix on a production site and sitemap generation that was taking 30 mins or more is now taking 3 mins and barely affecting the servers.
- sitemap generation is running at 21:11!!!
Comment #6
daniel.bosenI tested this patch on a medium-sized project, I did not quite get the improvements as described in comment 5, but the sitemap generation time was still halved. Amazing! I also tested the locking, it is working as expected as well. The MR looks good as well.
I hope this gets merged soon!
Comment #7
gbyte CreditAttribution: gbyte as a volunteer and at gbyte for gbyte commentedComment #8
gbyte CreditAttribution: gbyte as a volunteer and at gbyte for gbyte commentedComment #9
alexpottThanks for the review @gbyte - I've tried to address your feedback. I've changed everything to use the same lock id. Deleting the queue while rebuilding the queue is likely to produce odd results. Also I've test manually the locking. Will add a test.
Comment #10
alexpottAdded the test.
Comment #12
gbyte CreditAttribution: gbyte as a volunteer and at gbyte for gbyte commentedLooks good to me thanks! Let's put it into dev to get a few more eyes on it.