I have several importers in my site, which is set to run "as often as possible". When cron is executed, some of them seems to be updating every item even though nothing has changed. If I run the importers manually from the standalone form, everything works as intended, returning "There are no new nodes".
I've traced this down to the hash check, in FeedsProcessor::process()
. The stored hash is loaded as it should, but the hash for the current item is being calculated differently. It seems as this is due to a static variable in FeedsProcessor::hash()
, which stores the serialized mappings. I think that the static variable is getting it's initial value based on the importer that is triggered at first. The rest of the importers, will then use that static variable.
I haven't done any thorough research, but I've removed the static variable to test my thoughts, and everything is indeed working as it should.
Comment | File | Size | Author |
---|---|---|---|
#4 | feeds-serialized-mappings-1565890-4.patch | 1.07 KB | twistor |
#1 | 1848726-1-feeds-hash_check_cron.patch | 658 bytes | olofbokedal |
Comments
Comment #1
olofbokedal CreditAttribution: olofbokedal commentedThis patch simply removes the static variable, causing the mappings to be serialized every time.
Comment #2
olli CreditAttribution: olli commentedMarked #1565890: Nodes updated sometimes when feed hasn't changed as a duplicate.
Comment #3
olli CreditAttribution: olli commented#1 solves this.
Comment #4
twistor CreditAttribution: twistor commentedWe really shouldn't get rid of the cache entirely.
Comment #5
olli CreditAttribution: olli commentedThanks for looking at this.
#4 makes sense. Would #1 brake something or just be a little slower?
Comment #6
olli CreditAttribution: olli commentedI guess that has not been true for a while...
Comment #7
olli CreditAttribution: olli commentedHm. Do we need an update function for this?
Additionally, would it make sense to cache the hash of serialized mappings?
Comment #8
olofbokedal CreditAttribution: olofbokedal commentedWithout any actual testing, I believe that #4 would work, but I can't see that caching would mean any significant performance improvements, since it's a simple matter of serializing. #1 is simpler to understand and maintain, but #4 would mean that the serializing is only done once.
However, this is a small change. Both patches would work, the choice is up to the maintainer. This won't need an update function.
Comment #9
twistor CreditAttribution: twistor commentedYou're absolutely right, a quick test shows that the savings are trivial.
http://drupalcode.org/project/feeds.git/commit/8dd1ca3