We have a feed setup to use the XML Xpath parser configured to "Update existing nodes". We use the Node ID as the unique identifier and all seems to be working fine. However...

Every time we run an import, nodes are being updated. From my understanding, a hash of the feed item is generated and compared to determine which nodes should be updated and which nodes shouldn't.

Per http://drupal.org/node/631962#comment-4053270, I started down a path thinking that I needed to define some type of 'timestamp' in the XML data; upon further research, I've discovered that when using the XML Xpath parser the feeds_node_item table, which stores the hash, is always empty.

I've confirmed that when using the 'Common syndication parser', entries in the feeds_node_item table are generated properly.

I haven't seen any similar issues reported here, so I'm not sure if this is a bug or I just don't have something configured correctly. There aren't any errors being logged.

Comments

stacysimpson’s picture

No response. Is anyone else seeing the same behavior; or, do we have something setup incorrectly? Thanks in advance.

stacysimpson’s picture

Project: Feeds XPath Parser » Feeds
Version: 6.x-1.11 » 6.x-1.x-dev

Well, I originally thought that this issue had to do with using the XML Xpath parser, but it turns out that there was a configuration difference in my testing. (I wasn't trying to update specific nodes while testing RSS feeds...)

My current understanding:
If the 'Node processor' is configured to use Node IDs specified in the feed, no hash is generated; therefore, nodes are always updated whenever the feed triggers. Looks like there is deliberate logic in ./plugins/FeedsNodeProcessor.inc's buildNode() to bypass the hashing mechanism if the Node ID is specified in the feed.

We should be able to work around this issue by configuring nodes that were originally created by Feeds. However, this is odd behavior and should at least be documented somewhere.

twistor’s picture

Status: Active » Postponed (maintainer needs more info)

I'm not sure what you're looking at.

The has is generated and added outside of buildNode().

Did these nodes exist before the first import? Or are they created by Feeds? Using the nid as a unique target is tricky business.

stacysimpson’s picture

Status: Postponed (maintainer needs more info) » Active

In my particular scenario, the nodes existed before first import. However, I don't think that matters.

Basically, what I'm seeing: Whenever I chose to get a Node ID from a feed, even if it's not used as a unique target, no entries are generated for the feeds_node_item table.

twistor’s picture

Status: Active » Postponed (maintainer needs more info)

Is this still an issue? It's been a long time.

Feeds isn't designed to handle managing existing nodes. If should work though, if you map to the nid. I would expect the behavior to be that the existing nodes are updated once, no matter what, then managed normally.

rudiedirkx’s picture

Version: 6.x-1.x-dev » 7.x-2.0-alpha8

For me this is still an issue. Feeds creates a hash for the full source to see if anything has changed, but not per item. If you import a CSV of 2000 items and only 2 have changed, Feeds will update 2000 nodes. (So `updated` timestamp will be changed, even though the node hasn't.)

That's how the hash works, isn't it? Maybe I misunderstand.

rudiedirkx’s picture

Actually, I might be wrong. Apparently the hash works per item and it works. I don't know why it just updated all nodes after I changed only 2 in the source. Never mind. Sorry.

twistor’s picture

Issue summary: View changes
Status: Postponed (maintainer needs more info) » Closed (cannot reproduce)