So when I import from a feed using a form (/import/node_importer), I get a nice little progress bar and I can import roughly 4k rows in about a minute and a half. How can I get the same performance using cron/periodic imports/process in background? I currently have it configured to run every 15 minutes but it seems to make about 1% progress every time it runs- in other words about 4% every hour. Is there a way to make it import the whole lot (100%) every 15 minutes? Am I missing something?

Thanks a bunch.

Comments

sorensong’s picture

Anyone?

sorensong’s picture

Title: Feeds import behavior » How can I speed up import process?
surf12’s picture

i have the same problem

surf12’s picture

i have the same problem

davemaxg’s picture

Cron processes feeds in chunks. The default chunk size of 50 is extremely small. Just add this line to your settings.php file to change the chunk size.

$conf['feeds_process_limit'] = 2000;

I believe the only limit is a number that will not cause a php script time out. 2000 works fine for me.

star-szr’s picture

Thanks @davemaxg, that did the trick. I had a large CSV file that was only importing 1% at a time.

ressa’s picture

Component: Feeds Import » Code

Thanks @davemaxg I was having the same issue using Elysia cron to trigger a job_scheduler_cron import job, but it got stuck after 33 nodes. Adding $conf['feeds_process_limit'] = 2000; in settings.php fixed it.

I wonder why the max. limit is set that low, and not just fx 2500 to start with. If people have issues with php scripts timing out, they could always lower that value.

ndf’s picture

To all,

The default chunk-size is set low, because a high value can realy kill your website-performance.
The default php.ini settings (used on frontend) gives you 30 seconds to finish a php script (max_execution_time). On high performance sites, this setting could be lower.
If you put your feeds_process_limit high, than the import process can take more than 30 seconds easily. At 30 seconds you get your timeout error.

Feeds has multiple ways to run the import process:

  1. Front-end (with the progressbar)
  2. Cron
  3. Some sandbox-modules provide a third option: Drush (https://drupal.org/sandbox/enzo/1865202).

In most setups cron runs with different php-settings (via php-cli) than the normal frontend php.ini. This is cool, because on frontend you want speed (a lot of concurrent users / php-processes running short time) and on backend you want power (a php process that imports all your nodes).
Drush also uses these php-cli.ini settings.
The default max_execution_time for php-cli is 0, which means that it could run forever. That should be enough for 2000 nodes.

So if you want to import all your nodes every 15 minutes on a live site I would recommend to import via php-cli. That way you can set your feeds_process_limit high, without hitting on your frontend performance.
Howto do it:

  • In your feed-settings, choose 'import every 15 minutes'.
  • Run your cron every 15 minutes.
    • Be sure to run cron via php_cli (ask your hoster). You could make use of Drush to run cron. Or you could use Drush to run feeds_import (via the sandbox module above).
  • Put your feeds_process_limit high
    • Be sure 15 minutes is enough to import all nodes, if not lower the feed 'import every xx' *and* your cron frequency untill all nodes are imported.

Drupal variables like "feeds_process_limit" can be changed on multiple locations. Most easy ways are:

  • settings.php --> $conf['feeds_process_limit'] = 2000;
  • drush --> drush vset feeds_process_limit 2000
bennos’s picture

think we can close this. Solution is above.
About the limit: Normaly the chunk size of 50 works in every drupal enviroment. If the imports are bigger, you can set set this higher.

bennos’s picture

Status: Active » Fixed

Automatically closed -- issue fixed for 2 weeks with no activity.

Anonymous’s picture

Issue summary: View changes

wording

mozh92’s picture

Issue summary: View changes

I need help
if I set $conf['feeds_process_limit'] = 2000; I get 504 Error.

I tried to put 1000, 500, 200, but always get 504 Error
If I set limit 50 - it works!

What me settings on my server? I want 2000 limit. Thanks!
I have max_execution_time 3600 and max_input_time 3600

Alexandre360’s picture

Hello,

I have very large base of user entity to import.

It seems taht the $conf['feeds_process_limit'] only works for node, how to speed up import for users entity.

Alexandre360’s picture

Any news about that ? I wonder if I'm the only one that import users with drupal...

megachriz’s picture

The effect of 'feeds_process_limit' setting depends on the parser being used. Not every parser respects this setting. For example, the CSV parser and the parsers from Feeds extensible parsers respect this setting. So the solution posted in #8 should work for every processor (node, user, taxonomy term).

Since Feeds 7.x-2.0-beta1, Feeds will try to import multiple batches per cron run, depending on if there is still time left to run another batch. The time limit for this is 60 seconds, see feeds_cron_queue_info(). So if each batch takes 25 seconds it will do three batches per cron run (as when the second batch is completed it ran for 50 seconds, so it will do another one). This behaviour was added in #1231332: periodic import imports only one file per cron.

maxplus’s picture

Thanks,

I'm also started testing the setting "$conf['feeds_process_limit'] = 2000;" because of very slow import of big sources.

cyclone321’s picture

I agree with #15 setting the limit merely increases the amount of nodes per Queue Item, so if like in my case you have multiple Queue Items, you need to increase the processing time in the function feeds_cron_queue_info().

Will Execute as many Queue items as possible in 60 seconds one time per cron run.
$queues['feeds_source_import'] = array(
'worker callback' => 'feeds_source_import',
'time' => 60,
);

Will Execute as many Queue items as possible in 5 minutes.

$queues['feeds_source_import'] = array(
'worker callback' => 'feeds_source_import',
'time' => 300,
);

hacking the feeds module is probably not the best solution, but the hard coded value makes it tricky.

After changing this value, Drupal prior to 7.40 will timeout after 240 seconds because of a setting in common.inc, so you probably upgrade the core or you need to hack that aswell...

jomarocas’s picture

ok i make something for feeds working, with this configuration, the import upload but dont showing progress

$conf['feeds_debug'] = true;
$conf['feeds_process_limit'] = 100000;
$conf['http_request_timeout'] = 100000;

in php.ini

max_execution_time = 10000
max_input_vars = 10000
memory_limit = 19200M
post_max_size = 500M
i have a lot of memory

and i have 7.x-2.0-beta4, dont show progress but upload the items, sometime with errors

change with $conf['feeds_process_limit'] = 30; i see the progress

grahamvalue’s picture

Looks like the comments on this page have been copied verbatim as a tutorial on Setup large imports with Drupal Feeds.