Posted by mustafau on August 4, 2008 at 7:19pm
| Project: | Drupal core |
| Version: | 7.x-dev |
| Component: | aggregator.module |
| Category: | feature request |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | closed (fixed) |
Issue Summary
When refreshing a feed Aggregator should check whether hash of feed data is changed or not. This will save Aggregator from invoking the parser and querying the database for duplicates for an unchanged feed. A patch is attached.
| Attachment | Size | Status | Test result | Operations |
|---|---|---|---|---|
| aggregator-feed-hash.patch | 4.04 KB | Ignored: Check issue status. | None | None |
Comments
#1
Did this solve a performance issue on your website? The patch looks good but I'd be interested to learn how much queries this saves and whether this could add up to a significant performance improvement.
#2
We did the same hash method in FeedAPI and saw huge performance improvements.
If a feed hasn't changed, using a hash essentially saves one or more expensive checks on uniqueness per feed item of a feed. With news feeds and 2 queries for uniqueness checking that can be up to 10 X 2 or 20 X 2 queries against a large set of data per feed.
The same algorithm has been implemented by SimpleFeed and by the patch for aggregator rework over here #236237.
#3
Sounds good. I'd be happy to commit this patch -- it's a lot easier to review and commit than #236237: Aggregator rework: extensible API, SimpleXML parser, use taxonomy for categorization is.
#4
Small update: Clean hash column when removing feed items.
#5
@alex_b: Can you RTBC this?
#6
I gave it another review and committed this to CVS HEAD. Thanks.
#7
Thank you. Great to see this in.
#8
Automatically closed -- issue fixed for two weeks with no activity.