Project:Drupal core
Version:7.x-dev
Component:aggregator.module
Category:feature request
Priority:normal
Assigned:Unassigned
Status:closed (fixed)

Issue Summary

When refreshing a feed Aggregator should check whether hash of feed data is changed or not. This will save Aggregator from invoking the parser and querying the database for duplicates for an unchanged feed. A patch is attached.

AttachmentSizeStatusTest resultOperations
aggregator-feed-hash.patch4.04 KBIgnored: Check issue status.NoneNone

Comments

#1

Did this solve a performance issue on your website? The patch looks good but I'd be interested to learn how much queries this saves and whether this could add up to a significant performance improvement.

#2

We did the same hash method in FeedAPI and saw huge performance improvements.

If a feed hasn't changed, using a hash essentially saves one or more expensive checks on uniqueness per feed item of a feed. With news feeds and 2 queries for uniqueness checking that can be up to 10 X 2 or 20 X 2 queries against a large set of data per feed.

The same algorithm has been implemented by SimpleFeed and by the patch for aggregator rework over here #236237.

#3

Sounds good. I'd be happy to commit this patch -- it's a lot easier to review and commit than #236237: Aggregator rework: extensible API, SimpleXML parser, use taxonomy for categorization is.

#4

Small update: Clean hash column when removing feed items.

AttachmentSizeStatusTest resultOperations
aggregator-feed-hash-291064-4.patch4.56 KBIgnored: Check issue status.NoneNone

#5

@alex_b: Can you RTBC this?

#6

Status:needs review» fixed

I gave it another review and committed this to CVS HEAD. Thanks.

#7

Thank you. Great to see this in.

#8

Status:fixed» closed (fixed)

Automatically closed -- issue fixed for two weeks with no activity.