Problem/Motivation

We configured a CRON-triggered feeds import of large CSV files (about 6MB / 50k lines).

The job has been configured to run once every couple of hours, the CSV parser uses the default 100 lines limit. Therefore, it requires several iterations to finish importing the file.

If a feeds import is running and CRON is triggered in between (we have a CRON call every minute to regularly run some different scheduled background tasks), the running CSV parser job continues to go through the import, BUT it doesn't return the lines from the last position within the import. Instead, it starts over and returns the first line of the import.

In our setup, this leads to never ending import loops where datasets above 1k lines never are imported.

If we change the line limit to like a million or so, the file is imported correctly.

Steps to reproduce

Have a sufficiently big data source CSV that won't be finished within a minute's time.
Setup your system's Cron to run every minute.
Configure a Cron-triggered CSV import for that CSV file.

Proposed resolution

None yet, as the reason has to be identified first.

Comments

Mario Steinitz created an issue. See original summary.

Mario Steinitz’s picture

Issue summary: View changes
MegaChriz’s picture

That's odd. There is some test coverage available for importing a CSV file using multiple cron runs. See tests/src/Functional/CronTest.php. The test CronTest::testImportSourceWithMultipleCronRuns() ensures only 5 items can be imported per cron run and then tries to import a CSV file of nine items. In the first run, five nodes are imported. After the second run, nine nodes are imported in total.

The test module "feeds_test_multiple_cron_runs" ensures that only a limited amount of items can be imported per cron run by changing the cron queue time to 5 seconds and by delaying execution time with 5 seconds after 5 items are imported.

Come to think of it, the test doesn't change the line limit. So for that case we would still need test coverage.