Use-case:
I have been parsing data from a feed with a dozen different data points but I have only been actually processing/importing a few of those data points into fields. One day, I realize I need one of those data points in a new field. When I add the field and map correctly, nothing will actually import because the parsed item hasn't changed and the hash is still the same. Ideally I would like a "Clear import history" button that simply does something like UPDATE feeds_node_item SET hash='' WHERE feed_nid=x.
I have to do this anyway to solve an internal problem. Let me know what you think and if you believe a patch would be helpful.
Comment | File | Size | Author |
---|---|---|---|
#10 | 836090-10_hash_mapping.patch | 1.23 KB | alex_b |
#9 | 836090-config-hashes.patch | 3.59 KB | andrewlevine |
#7 | 836090-7_hash_configuration.patch | 1.1 KB | alex_b |
#5 | 836090.patch | 2.99 KB | andrewlevine |
Comments
Comment #1
alex_b CreditAttribution: alex_b commentedI'd suggest to blow the hash away when submitting the mapping form.
Comment #2
andrewlevine CreditAttribution: andrewlevine commentedWould you be comfortable having this as the default behavior? I think it could work...but it seems like a pretty significant (in terms of performance and unanticipated behavior) thing to do without the user's knowledge. Should it be a a setting under "Basic settings"? Should it be defaulted to on or off?
Comment #3
alex_b CreditAttribution: alex_b commentedYes.
Offering a button that would clear that hash cache manually would assume the user's knowledge about some of the internals of Feeds and can cause some unnecessary confusion. Given that blowing away the hash is comparatively cheap - it's fine to clear it wherever necessary. Just like a cache.
Comment #4
andrewlevine CreditAttribution: andrewlevine commentedwill work on a patch for this
Comment #5
andrewlevine CreditAttribution: andrewlevine commentedHere is my go at this issue. I think I actually did it in a more flexible way than what we talked about in IRC:
FeedsProcessor keeps track in a class variable whether the mappings have been changed or not. This way any extending processor can use this variable in its save() method. By default, only FeedsNodeProcessor uses this variable, and will clear the proper hashes in feeds_node_item in save().
Let me know what you think.
Comment #6
andrewlevine CreditAttribution: andrewlevine commentedI can also make a test for this once #837922: add removeMappings() to test suite and more extensive addMapping testing gets in
Comment #7
alex_b CreditAttribution: alex_b commentedI was just reviewing and my first thought was that I'd like to avoid the mapping_changed flag as it assumes knowledge around a state (mapping_changed) of the processor object in more than one class. States like these get very sticky fast.
Then I actually saw that we won't get away with an update on all feed items because if there is A LOT of them the update gets slow (I just measured .32 s on 5000 items which makes me queasy enough).
I think we should tackle this problem completely different and hash configuration together with items. See attached patch.
BTW, I'd also like to stay away from rescheduling items. Would love to learn more about your thinking behind that though.
Comment #8
andrewlevine CreditAttribution: andrewlevine commentedI really like this approach you are taking though updating all feed items on every single configuration change makes me nervous because we have no idea what this change will do to existing extensions of FeedsNodeProcessor. I will make a patch that only includes some of the configuration changes.
The only reason I rescheduled items is because that's what happens on "Delete" of feed items. I have no problems leaving that out .
Comment #9
andrewlevine CreditAttribution: andrewlevine commentedThe attached patch adds some complexity over patch #7 but it also means:
Comment #10
alex_b CreditAttribution: alex_b commentedI am fine to only hash mappings, but I would *really* like to keep this as simple as possible.
- Only serializes config['mappings']
- Caches serialized array for performance
- Overriding still simple
Running tests now, thinking of committing this soon.
Comment #11
alex_b CreditAttribution: alex_b commentedI've committed #10 now, thank you for your great input. If you'd like to suggest adjustments to this patch, let's deal with it on a separate issue.
http://drupal.org/cvs?commit=398516
Comment #12
andrewlevine CreditAttribution: andrewlevine commentedThanks for the review and the commit. I'm fine with #10