Not to be rude but the way feeds handles batch processing defies what batch was intended for.

For example, if you import an rss feed that has 20 item. Feeds imports all 20 item in one batch operation. Isn't the point of batch to break apart each item into one operation?

The reason that I came across this problem was when importing feeds with enclosures in them, feeds batch cause php to timeout.

Comments

twistor’s picture

There is always the feeds_process_limit variable, that you can set to the appropriate level. It defaults to 50, which I've seen to be a decent amount. It's debatable whether Feeds should handle one item at a time or several, but I'd hardly call Feeds' batch processing pointless.

Downloading the enclosure of each item is certainly going to take longer than normal.

pwaterz’s picture

feeds_process_limit does not work in d7 as per my other issue http://drupal.org/node/1363094.

It still does not make sense to use batch, you could just add a page call back and process it in one page request. Are you just using it to show a loading bar?

dooug’s picture

Title: Feeds batch process is pointless » Feeds batch process not fully implemented - needed for large imports

It appears in feeds 7.x-2.x-dev that the feeds_process_limit is only used in the clear() function of the plugins/FeedsProcessor.inc. This doesn't seem to be the case for the import processing functions... (correct me if I have missed something.)

Our use case of feeds requires better process limiting on import. We are trying to import a large (multiple thousand) quantity of nodes/users but are getting timeout errors on the import. The import needs to be processed in smaller batches, but this does not appear to be the case in Feeds.

Also, found a similar issue for Large Feeds imports: #1302034: Large Feeds import exhausts RAM & corrupts DB

manu manu’s picture

using feeds_process_limit may not meet all use cases:

As the feeds import process is divided in 3 parts:

  1. fetch
  2. parse
  3. process

Even if feeds could process smaller chunks of items in step 3, steps 1 and 2 may time out before.

In my case I was importing some 300MB xml files...

I ended to use the "Process in background" option and hacking import & clear form handlers to not process the first chunk, witch was causing the time out.

Hope it helps

twistor’s picture

Assigned: Unassigned » twistor
Priority: Normal » Major
Issue tags: +D7 stable release blocker

Holy crap, I just noticed what #4 is pointing out. Apparently when #744660: Expand batch support to fetchers and parsers was added, support for batching on processors was removed. Oh joy!

Theoretically, this would be fine. Fetchers and parsers would respect the processor's limit, and only give it what it could handle. In practice, this doesn't work at all. Most fetchers fetch 1 item, and ignore batching altogether. LOTS of parsers HAVE to parse everything at once. There is a limited number of parsers that can batch. CSV is one, because it can read from a file. Most XML based parsers use the DOM, and parse everything at once.

arg.

byronveale’s picture

There was discussion related to this over on issue #1470530, and some code that could perhaps move this toward completion.

See comments #100 and #121.

Thanks for everyone's efforts!

liquidcms’s picture

possibly related: #1778972: batch import does not start - no errors i have been battling with this all day.. it seems as though only 50 records get imported and then "batch" hangs. i am sure this used to work months ago.

osopolar’s picture

jenlampton’s picture

Version: 7.x-2.x-dev » 7.x-2.0-alpha8
Issue summary: View changes

@loquidcms I don't think you're seeing any batching at all, the variable feeds_process_limit is set to 50 by default, so that's why you're seeing that many records imported.

Is the batching being added back over in this issue?
#1470530: Unpublish/Delete nodes not included in feed

If not, can any of the code from this D6 patch be of use here?
#1139376: Batch processing fails on large feeds

jenlampton’s picture

Well, it looks like #1470530: Unpublish/Delete nodes not included in feed landed, but since I'm not entirely sure the batching that was added on delete will solve my problem of needing batching on import, so commenting here too.

I have a site I'm supporting that needs it feeds_process_limit increased by about 200 each quarter so that the import will complete on cron as more data is added. I'd love to see a suggestion / recommendation on how to solve this problem if anyone has one.

twistor’s picture

Assigned: twistor » Unassigned
Status: Active » Closed (works as designed)
Issue tags: -D7 stable release blocker

At this point, there's nothing we can (or should) do.

The way it works is thus:

  1. Fetching is batched.
  2. Parsing is batched. If the parser doesn't respect batching, then that's a bug in the parser.
  3. The processor processes the items.

Feeds XPath parser was fixed a long time ago.