periodic import imports only one file per cron [#1231332]

Comment	File	Size	Author
#28	feeds-queue-1231332-28.patch	3.51 KB	twistor

#27	feeds-queue-1231332-27.patch	3.52 KB	klausi

#23	feeds-queue-1231332-23.patch	2.96 KB	klausi

#21	feeds-queue-1231332-21.patch	1012 bytes	klausi

Comment #2

nyl_auster CreditAttribution: nyl_auster commented 1 August 2011 at 08:21

nobody has a clue for me ?

Log in or register to post comments

Comment #3

Cajun CreditAttribution: Cajun commented 4 November 2011 at 12:16

hey did you figure this out? I got the exact same problem

Log in or register to post comments

Comment #4

juhaniemi CreditAttribution: juhaniemi commented 8 February 2012 at 14:41

Category:

support

» bug

Confirming this issue and setting it as a bug.

Log in or register to post comments

Comment #5

janfang CreditAttribution: janfang commented 10 May 2012 at 12:35

I have the same problem. Have you found a solution?

Log in or register to post comments

Comment #6

valderama CreditAttribution: valderama commented 14 May 2012 at 16:17

Seems like we are having the same problem here. Any clues someone?

Thanks,
walter

Log in or register to post comments

Comment #7

surf12 CreditAttribution: surf12 commented 17 May 2012 at 20:51

the same problem. Help us pleace!
Thanks...

Log in or register to post comments

Comment #8

siva.thanush CreditAttribution: siva.thanush commented 21 June 2012 at 09:02

Version:	7.x-2.0-alpha4	» 7.x-2.0-alpha5
Priority:	Normal	» Critical

Same problem persists.
Or this post is duplicate?
For me this happens in the next version also.

Log in or register to post comments

Comment #9

siva.thanush CreditAttribution: siva.thanush commented 21 June 2012 at 13:13

Version:	7.x-2.0-alpha5	» 7.x-2.0-alpha4
Category:	bug	» task
Priority:	Critical	» Normal
Status:	Active	» Fixed

Its working in the latest version 7.x-2.0-alpha5
I am not sure with the lower one.

Log in or register to post comments

Comment #10

franz

any

Montréal

CreditAttribution: franz commented 22 June 2012 at 13:07

Status:

Fixed

» Closed (won't fix)

Log in or register to post comments

Comment #11

cmarcera CreditAttribution: cmarcera commented 28 November 2012 at 07:10

Version:

7.x-2.0-alpha4

» 7.x-2.0-alpha7

I'm using alpha7 and this bug persists for me. I have cron running every 10 minutes and my periodic import is set to run as often as possible. I'm using the file upload fetcher to parse XML files with the "Supply path to file or directory directly" option checked.

Every 10 minutes, my Feed Importer imports 1 XML file from the directory. After that, the feed is locked and must be unlocked if I want to run it manually. If I run it manually, it imports all of the XML files as expected.

Log in or register to post comments

Comment #12

gurrmag CreditAttribution: gurrmag commented 28 November 2012 at 12:19

I'm having this issue too...

I'm using feeds to take new files uploaded to the server and import them periodically as nodes of a specific content type, with update existing nodes selected. However, when jobs scheduler fires, only one node is processed at a time, and, when the import is complete, it reports that 100's of nodes have been imported, i.e. the total of all nodes imported, rather than just the ten or so new files that are available daily. None of these files are particularly long - many of them are just one paragraph of text.

I have minimised this issue by running jobs scheduler every five minutes - fortunately feeds is the only thing using this.

Log in or register to post comments

Comment #13

cmarcera CreditAttribution: cmarcera commented 29 November 2012 at 04:12

My issue will import 1 item, then lock the feed saying it's XX% done. After the next cron run, it imports another item and increase the percentage done. It's baffling because the percentage clearly knows how many files are in the directory to process, it's just stopping after one.

Log in or register to post comments

Comment #14

cmarcera CreditAttribution: cmarcera commented 29 November 2012 at 21:53

Category:	task	» bug
Status:	Closed (won't fix)	» Active

I've now tried various settings in my Feeds importer and none seem to import more than a single item.

Basic settings
• Attached to: [none]
• Periodic import: as often as possible
• Import on submission: Checked
• Process in background: Unchecked

Fetcher
• File upload: Upload content from a local file.

Parser
• XPath XML parser: Parse XML using XPath.

Processor
• Node processor: Create and update nodes.

gurrmag, what settings are you using? Trying to find a common denominator.

Log in or register to post comments

Comment #15

gurrmag CreditAttribution: gurrmag commented 13 December 2012 at 13:23

My settings are:
Basic settings
• Attached to: [none]
• Periodic import: 1 day
• Import on submission: Checked
• Process in background: Unchecked

Fetcher
• File upload: Upload content from a local file.

Parser
• XPath XML parser: Parse XML using XPath.

Processor
• Node processor: Create and update nodes.

I've tried a number of different combinations, but haven't found a culprit for this yet either...

Log in or register to post comments

Comment #16

dgtlmoon CreditAttribution: dgtlmoon commented 13 December 2012 at 16:58

what does the content of your job_schedule table look like?

Log in or register to post comments

Comment #17

cmarcera CreditAttribution: cmarcera commented 13 December 2012 at 21:13

--
-- Table structure for table `d7_job_schedule`
--

CREATE TABLE IF NOT EXISTS `d7_job_schedule` (
  `item_id` int(10) unsigned NOT NULL AUTO_INCREMENT COMMENT 'Primary Key: Unique item ID.',
  `name` varchar(128) NOT NULL DEFAULT '' COMMENT 'Name of the schedule.',
  `type` varchar(128) NOT NULL DEFAULT '' COMMENT 'Type identifier of the job.',
  `id` int(10) unsigned NOT NULL DEFAULT '0' COMMENT 'Numeric identifier of the job.',
  `period` int(10) unsigned NOT NULL DEFAULT '0' COMMENT 'Time period after which job is to be executed.',
  `crontab` varchar(255) NOT NULL DEFAULT '' COMMENT 'Crontab line in *NIX format.',
  `data` longblob COMMENT 'The arbitrary data for the item.',
  `expire` int(11) NOT NULL DEFAULT '0' COMMENT 'Timestamp when job expires.',
  `created` int(11) NOT NULL DEFAULT '0' COMMENT 'Timestamp when the item was created.',
  `last` int(10) unsigned NOT NULL DEFAULT '0' COMMENT 'Timestamp when a job was last executed.',
  `periodic` smallint(5) unsigned NOT NULL DEFAULT '0' COMMENT 'If true job will be automatically rescheduled.',
  `next` int(10) unsigned NOT NULL DEFAULT '0' COMMENT 'Timestamp when a job is to be executed (next = last + period), used for fast ordering.',
  `scheduled` int(10) unsigned NOT NULL DEFAULT '0' COMMENT 'Timestamp when a job was scheduled. 0 if a job is currently not scheduled.',
  PRIMARY KEY (`item_id`),
  KEY `name_type_id` (`name`,`type`,`id`),
  KEY `name_type` (`name`,`type`),
  KEY `next` (`next`),
  KEY `scheduled` (`scheduled`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 COMMENT='Schedule of jobs to be executed.' AUTO_INCREMENT=53525 ;

--
-- Dumping data for table `d7_job_schedule`
--

INSERT INTO `d7_job_schedule` (`item_id`, `name`, `type`, `id`, `period`, `crontab`, `data`, `expire`, `created`, `last`, `periodic`, `next`, `scheduled`) VALUES
(53522, 'feeds_source_import', 'xml_files_to_stories', 0, 0, '', NULL, 0, 0, 1355433121, 1, 1355433121, 0),
(53523, 'feeds_source_import', 'xml_files_to_pages', 0, 0, '', NULL, 0, 0, 1355433121, 1, 1355433121, 0),
(53524, 'feeds_source_import', 'xml_files_to_editions', 0, 0, '', NULL, 0, 0, 1355433121, 1, 1355433121, 0);

Log in or register to post comments

Comment #18

gurrmag CreditAttribution: gurrmag commented 14 December 2012 at 09:35

-- Table structure for table `job_schedule`
--

CREATE TABLE IF NOT EXISTS `job_schedule` (
`item_id` int(10) unsigned NOT NULL AUTO_INCREMENT COMMENT 'Primary Key: Unique item ID.',
`name` varchar(128) NOT NULL DEFAULT '' COMMENT 'Name of the schedule.',
`type` varchar(128) NOT NULL DEFAULT '' COMMENT 'Type identifier of the job.',
`id` int(10) unsigned NOT NULL DEFAULT '0' COMMENT 'Numeric identifier of the job.',
`period` int(10) unsigned NOT NULL DEFAULT '0' COMMENT 'Time period after which job is to be executed.',
`crontab` varchar(255) NOT NULL DEFAULT '' COMMENT 'Crontab line in *NIX format.',
`data` longblob COMMENT 'The arbitrary data for the item.',
`expire` int(11) NOT NULL DEFAULT '0' COMMENT 'Timestamp when job expires.',
`created` int(11) NOT NULL DEFAULT '0' COMMENT 'Timestamp when the item was created.',
`last` int(10) unsigned NOT NULL DEFAULT '0' COMMENT 'Timestamp when a job was last executed.',
`periodic` smallint(5) unsigned NOT NULL DEFAULT '0' COMMENT 'If true job will be automatically rescheduled.',
`next` int(10) unsigned NOT NULL DEFAULT '0' COMMENT 'Timestamp when a job is to be executed (next = last + period), used for fast ordering.',
`scheduled` int(10) unsigned NOT NULL DEFAULT '0' COMMENT 'Timestamp when a job was scheduled. 0 if a job is currently not scheduled.',
PRIMARY KEY (`item_id`),
KEY `name_type_id` (`name`,`type`,`id`),
KEY `name_type` (`name`,`type`),
KEY `next` (`next`),
KEY `scheduled` (`scheduled`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='Schedule of jobs to be executed.' AUTO_INCREMENT=1533 ;

--
-- Dumping data for table `job_schedule`
--

INSERT INTO `job_schedule` (`item_id`, `name`, `type`, `id`, `period`, `crontab`, `data`, `expire`, `created`, `last`, `periodic`, `next`, `scheduled`) VALUES
(1530, 'feeds_source_import', 'psl_xml_importer', 0, 86400, '', NULL, 0, 0, 1355476741, 1, 1355563141, 0),
(1532, 'feeds_source_import', 'news_xml_importer', 0, 0, '', NULL, 0, 0, 1355476801, 1, 1355476801, 0);

Log in or register to post comments

Comment #19

DannyPfeiffer CreditAttribution: DannyPfeiffer commented 12 February 2013 at 20:19

This is related to the hardcoded time parameter in the queue_info callback - changing the default 15 seconds to something higher (240 in my case) solves this problem.

Lines 85-104 of feeds.module:

/**
 * Implements hook_cron_queue_info().
 */
function feeds_cron_queue_info() {
  $queues = array();
  $queues['feeds_source_import'] = array(
    'worker callback' => 'feeds_source_import',
    'time' => 15,
  );
  $queues['feeds_source_clear'] = array(
    'worker callback' => 'feeds_source_clear',
    'time' => 15,
  );
  $queues['feeds_importer_expire'] = array(
    'worker callback' => 'feeds_importer_expire',
    'time' => 15,
  );
  $queues['feeds_push_unsubscribe'] = array(
    'worker callback' => 'feeds_push_unsubscribe',
    'time' => 15,
  );
  return $queues;
}

the hook_cron_queue_info adds entries to Drupal's queue table - and processes the queue for "up to" the time period specified - 15 seconds in the default.

If you have alot of large feed sources to import on each scheduled run (like i did), this will usually only get through one or two feed sources before hitting that limit.

You'll end up with a lot of items stuck in the queue table (Mine had 70,000+ rows of duplicate entries. That's because every scheduled run, all your feed sources get added to the queue, but only one or two get processed.

If you increase your time limit to something large, and have lots of duplicate rows in your queue table, the next run will process as many of those records as it can - so I'd suggest blanking out the queue table first (make a backup of it in case you need to restore something).

Log in or register to post comments

Comment #20

queenvictoria CreditAttribution: queenvictoria commented 29 May 2013 at 03:51

Component:

Feeds Import

» Code

I also had the issue in #19. After chasing around all over the place setting timeouts in nginx, php, http_request_timeout in settings file, I've settled on drush calling feeds import. I've added some work over here to aid in this task.
http://drupal.org/node/608408

My table had 600k feeds imports queued. Nice tip for clearing the table. Good idea to back up first. This op took 5 minutes.
mysql> delete from queue where name = "feeds_source_import";

Log in or register to post comments

Comment #21

klausi

he/him||they/them

German

🇦🇹 Vienna

CreditAttribution: klausi commented 24 June 2013 at 11:53

Version:	7.x-2.0-alpha7	» 7.x-2.x-dev
Status:	Active	» Needs review

File	Size
feeds-queue-1231332-21.patch	1012 bytes

Here is a patch that increases the default run time for the feeds import queue to 60 seconds, which is the same as core's aggregator module uses.

I also modified feeds_source_import() to re-queue itself immediately if importing of a feed has not finished. That allows us to process more items of one particular feed during a cron run for example.

Log in or register to post comments

Comment #22

24 June 2013 at 11:59

Status:

Needs review

» Needs work

The last submitted patch, feeds-queue-1231332-21.patch, failed testing.

Log in or register to post comments

Comment #23

klausi

he/him||they/them

German

🇦🇹 Vienna

CreditAttribution: klausi commented 24 June 2013 at 13:52

Status:

Needs work

» Needs review

File	Size
feeds-queue-1231332-23.patch	2.96 KB

Fixed the test case, since now it is not possible to determine how many items have been processed in one cron run.

Log in or register to post comments

Comment #24

twistor CreditAttribution: twistor commented 3 July 2013 at 08:35

Assigned:

Unassigned

» twistor

Nifty. Assigning to myself so I can review it at a normal hour.

Overall, I like the idea.

Making cron non-deterministic is a bit scary. We already have a bunch of problems with it. That said, this would solve a lot of problems with people's expectations. I don't think this will affect sites with a large number of feeds. Well, the re-queuing part won't, but increasing the time limit obviously will.

I kind of like the idea to use the Queue directly, in this case, rather than JobScheduler. There are a couple more places we could do this as well: clearing and expiring.

Could we move the logic back into FeedsSource::scheduleImport()?

Log in or register to post comments

Comment #25

lwalley CreditAttribution: lwalley commented 9 August 2013 at 19:44

I've been running into the same issue described in #19, with 70,000+ queue entries and I'm wondering if Job Scheduler might be able to help prevent these duplicate jobs. I've added my thoughts to this ticket: #2061647: Rescheduling 'stuck' periodic jobs results in duplicate queue entries?

Log in or register to post comments

Comment #26

lalit774 CreditAttribution: lalit774 commented 29 August 2013 at 01:34

I have done by following method. so we don't need to hack feeds module.

function hook_cron_queue_info_alter(&$queues) {
  $queues['feeds_source_import']['worker callback'] = '_custom_function_name_feeds_source_import';
  $queues['feeds_source_import']['time'] = 90;
}

function _custom_function_name_feeds_source_import($job) {
  $source = feeds_source($job['type'], $job['id']);
  try {
    $source->existing()->import();
  }
  catch (FeedsNotExistingException $e) {
    // Do nothing.
  }
  catch (Exception $e) {
    $source->log('import', $e->getMessage(), array(), WATCHDOG_ERROR);
  }
  if ($source->progressImporting() == FEEDS_BATCH_COMPLETE) {
    // Feed import finished, so we schedule the next execution in the future.
    $source->scheduleImport();
  }
  else {
    // Feed is not fully imported yet, so we put this job back in the queue
    // immediately for further processing.
    $queue = DrupalQueue::get('feeds_source_import');
    $queue->createItem($job);
  }
}