Use-case:
I have been parsing data from a feed with a dozen different data points but I have only been actually processing/importing a few of those data points into fields. One day, I realize I need one of those data points in a new field. When I add the field and map correctly, nothing will actually import because the parsed item hasn't changed and the hash is still the same. Ideally I would like a "Clear import history" button that simply does something like UPDATE feeds_node_item SET hash='' WHERE feed_nid=x.

I have to do this anyway to solve an internal problem. Let me know what you think and if you believe a patch would be helpful.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

alex_b’s picture

I'd suggest to blow the hash away when submitting the mapping form.

andrewlevine’s picture

Would you be comfortable having this as the default behavior? I think it could work...but it seems like a pretty significant (in terms of performance and unanticipated behavior) thing to do without the user's knowledge. Should it be a a setting under "Basic settings"? Should it be defaulted to on or off?

alex_b’s picture

Would you be comfortable having this as the default behavior?

Yes.

but it seems like a pretty significant (in terms of performance and unanticipated behavior) thing to do without the user's knowledge

Offering a button that would clear that hash cache manually would assume the user's knowledge about some of the internals of Feeds and can cause some unnecessary confusion. Given that blowing away the hash is comparatively cheap - it's fine to clear it wherever necessary. Just like a cache.

andrewlevine’s picture

Assigned: Unassigned » andrewlevine

will work on a patch for this

andrewlevine’s picture

Status: Active » Needs review
FileSize
2.99 KB

Here is my go at this issue. I think I actually did it in a more flexible way than what we talked about in IRC:

FeedsProcessor keeps track in a class variable whether the mappings have been changed or not. This way any extending processor can use this variable in its save() method. By default, only FeedsNodeProcessor uses this variable, and will clear the proper hashes in feeds_node_item in save().

Let me know what you think.

andrewlevine’s picture

alex_b’s picture

I was just reviewing and my first thought was that I'd like to avoid the mapping_changed flag as it assumes knowledge around a state (mapping_changed) of the processor object in more than one class. States like these get very sticky fast.

Then I actually saw that we won't get away with an update on all feed items because if there is A LOT of them the update gets slow (I just measured .32 s on 5000 items which makes me queasy enough).

I think we should tackle this problem completely different and hash configuration together with items. See attached patch.

BTW, I'd also like to stay away from rescheduling items. Would love to learn more about your thinking behind that though.

andrewlevine’s picture

I really like this approach you are taking though updating all feed items on every single configuration change makes me nervous because we have no idea what this change will do to existing extensions of FeedsNodeProcessor. I will make a patch that only includes some of the configuration changes.

The only reason I rescheduled items is because that's what happens on "Delete" of feed items. I have no problems leaving that out .

andrewlevine’s picture

Assigned: andrewlevine » Unassigned
FileSize
3.59 KB

The attached patch adds some complexity over patch #7 but it also means:

  • Hashes aren't changed by unimportant configuration changes or ordering
  • Rehashing behavior can be simply overridden by overriding getHashedConfigKeys() and changing array values
alex_b’s picture

I am fine to only hash mappings, but I would *really* like to keep this as simple as possible.

- Only serializes config['mappings']
- Caches serialized array for performance
- Overriding still simple

Running tests now, thinking of committing this soon.

alex_b’s picture

Title: "Remove Import History" (Remove hash data for a Feed Source) » Include mappings configuration in hash
Status: Needs review » Fixed

I've committed #10 now, thank you for your great input. If you'd like to suggest adjustments to this patch, let's deal with it on a separate issue.

http://drupal.org/cvs?commit=398516

andrewlevine’s picture

Thanks for the review and the commit. I'm fine with #10

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.