Posted by nyl auster on July 27, 2011 at 4:45pm
19 followers
Jump to:
| Project: | Feeds |
| Version: | 7.x-2.0-alpha7 |
| Component: | Feeds Import |
| Category: | bug report |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | active |
Issue Summary
Hello
Thanks for this great module.
I have a question : i created a fetcher that extends FeedsFileFetcher in order that i can configure my "directory path" in fetcher's form rather than in the standalone form. It's the only solution i found to import periodically my xml files.
When i'm doing manually (standalone form) the import, all is fine : my 40 xml files are fetched, parsed and converted to nodes.
But when i set "Periodic import" in "Basic settings" to "as often as possible"; then only ONE file is imported each time the cron is fired. I would like to parsed all the xml contained in my folder each time ...
Why only one xml file is imported ? how to change this behavior ?
Thanks
Comments
#1
PS : my fetcher's code is exactly the same as FeedsFileFetcher for now
<?php
public function fetch(FeedsSource $source) {
$source_config = $source->getConfigFor($this);
// Just return a file fetcher result if this is a file.
if (is_file($source_config['source'])) {
return new FeedsFileFetcherResult($source_config['source']);
}
// Batch if this is a directory.
$state = $source->state(FEEDS_FETCH);
$files = array();
if (!isset($state->files)) {
$state->files = $this->listFiles($source_config['source']);
$state->total = count($state->files);
}
if (count($state->files)) {
$file = array_shift($state->files);
$state->progress($state->total, $state->total - count($state->files));
return new FeedsFileFetcherResult($file);
}
throw new Exception(t('Resource is not a file or it is an empty directory: %source', array('%source' => $source_config['source'])));
}
?>
#2
nobody has a clue for me ?
#3
hey did you figure this out? I got the exact same problem
#4
Confirming this issue and setting it as a bug.
#5
I have the same problem. Have you found a solution?
#6
Seems like we are having the same problem here. Any clues someone?
Thanks,
walter
#7
the same problem. Help us pleace!
Thanks...
#8
Same problem persists.
Or this post is duplicate?
For me this happens in the next version also.
#9
Its working in the latest version 7.x-2.0-alpha5
I am not sure with the lower one.
#10
#11
I'm using alpha7 and this bug persists for me. I have cron running every 10 minutes and my periodic import is set to run as often as possible. I'm using the file upload fetcher to parse XML files with the "Supply path to file or directory directly" option checked.
Every 10 minutes, my Feed Importer imports 1 XML file from the directory. After that, the feed is locked and must be unlocked if I want to run it manually. If I run it manually, it imports all of the XML files as expected.
#12
I'm having this issue too...
I'm using feeds to take new files uploaded to the server and import them periodically as nodes of a specific content type, with update existing nodes selected. However, when jobs scheduler fires, only one node is processed at a time, and, when the import is complete, it reports that 100's of nodes have been imported, i.e. the total of all nodes imported, rather than just the ten or so new files that are available daily. None of these files are particularly long - many of them are just one paragraph of text.
I have minimised this issue by running jobs scheduler every five minutes - fortunately feeds is the only thing using this.
#13
My issue will import 1 item, then lock the feed saying it's XX% done. After the next cron run, it imports another item and increase the percentage done. It's baffling because the percentage clearly knows how many files are in the directory to process, it's just stopping after one.
#14
I've now tried various settings in my Feeds importer and none seem to import more than a single item.
gurrmag, what settings are you using? Trying to find a common denominator.
#15
My settings are:
Basic settings
• Attached to: [none]
• Periodic import: 1 day
• Import on submission: Checked
• Process in background: Unchecked
Fetcher
• File upload: Upload content from a local file.
Parser
• XPath XML parser: Parse XML using XPath.
Processor
• Node processor: Create and update nodes.
I've tried a number of different combinations, but haven't found a culprit for this yet either...
#16
what does the content of your job_schedule table look like?
#17
--
-- Table structure for table `d7_job_schedule`
--
CREATE TABLE IF NOT EXISTS `d7_job_schedule` (
`item_id` int(10) unsigned NOT NULL AUTO_INCREMENT COMMENT 'Primary Key: Unique item ID.',
`name` varchar(128) NOT NULL DEFAULT '' COMMENT 'Name of the schedule.',
`type` varchar(128) NOT NULL DEFAULT '' COMMENT 'Type identifier of the job.',
`id` int(10) unsigned NOT NULL DEFAULT '0' COMMENT 'Numeric identifier of the job.',
`period` int(10) unsigned NOT NULL DEFAULT '0' COMMENT 'Time period after which job is to be executed.',
`crontab` varchar(255) NOT NULL DEFAULT '' COMMENT 'Crontab line in *NIX format.',
`data` longblob COMMENT 'The arbitrary data for the item.',
`expire` int(11) NOT NULL DEFAULT '0' COMMENT 'Timestamp when job expires.',
`created` int(11) NOT NULL DEFAULT '0' COMMENT 'Timestamp when the item was created.',
`last` int(10) unsigned NOT NULL DEFAULT '0' COMMENT 'Timestamp when a job was last executed.',
`periodic` smallint(5) unsigned NOT NULL DEFAULT '0' COMMENT 'If true job will be automatically rescheduled.',
`next` int(10) unsigned NOT NULL DEFAULT '0' COMMENT 'Timestamp when a job is to be executed (next = last + period), used for fast ordering.',
`scheduled` int(10) unsigned NOT NULL DEFAULT '0' COMMENT 'Timestamp when a job was scheduled. 0 if a job is currently not scheduled.',
PRIMARY KEY (`item_id`),
KEY `name_type_id` (`name`,`type`,`id`),
KEY `name_type` (`name`,`type`),
KEY `next` (`next`),
KEY `scheduled` (`scheduled`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='Schedule of jobs to be executed.' AUTO_INCREMENT=53525 ;
--
-- Dumping data for table `d7_job_schedule`
--
INSERT INTO `d7_job_schedule` (`item_id`, `name`, `type`, `id`, `period`, `crontab`, `data`, `expire`, `created`, `last`, `periodic`, `next`, `scheduled`) VALUES
(53522, 'feeds_source_import', 'xml_files_to_stories', 0, 0, '', NULL, 0, 0, 1355433121, 1, 1355433121, 0),
(53523, 'feeds_source_import', 'xml_files_to_pages', 0, 0, '', NULL, 0, 0, 1355433121, 1, 1355433121, 0),
(53524, 'feeds_source_import', 'xml_files_to_editions', 0, 0, '', NULL, 0, 0, 1355433121, 1, 1355433121, 0);
#18
-- Table structure for table `job_schedule`
--
CREATE TABLE IF NOT EXISTS `job_schedule` (
`item_id` int(10) unsigned NOT NULL AUTO_INCREMENT COMMENT 'Primary Key: Unique item ID.',
`name` varchar(128) NOT NULL DEFAULT '' COMMENT 'Name of the schedule.',
`type` varchar(128) NOT NULL DEFAULT '' COMMENT 'Type identifier of the job.',
`id` int(10) unsigned NOT NULL DEFAULT '0' COMMENT 'Numeric identifier of the job.',
`period` int(10) unsigned NOT NULL DEFAULT '0' COMMENT 'Time period after which job is to be executed.',
`crontab` varchar(255) NOT NULL DEFAULT '' COMMENT 'Crontab line in *NIX format.',
`data` longblob COMMENT 'The arbitrary data for the item.',
`expire` int(11) NOT NULL DEFAULT '0' COMMENT 'Timestamp when job expires.',
`created` int(11) NOT NULL DEFAULT '0' COMMENT 'Timestamp when the item was created.',
`last` int(10) unsigned NOT NULL DEFAULT '0' COMMENT 'Timestamp when a job was last executed.',
`periodic` smallint(5) unsigned NOT NULL DEFAULT '0' COMMENT 'If true job will be automatically rescheduled.',
`next` int(10) unsigned NOT NULL DEFAULT '0' COMMENT 'Timestamp when a job is to be executed (next = last + period), used for fast ordering.',
`scheduled` int(10) unsigned NOT NULL DEFAULT '0' COMMENT 'Timestamp when a job was scheduled. 0 if a job is currently not scheduled.',
PRIMARY KEY (`item_id`),
KEY `name_type_id` (`name`,`type`,`id`),
KEY `name_type` (`name`,`type`),
KEY `next` (`next`),
KEY `scheduled` (`scheduled`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='Schedule of jobs to be executed.' AUTO_INCREMENT=1533 ;
--
-- Dumping data for table `job_schedule`
--
INSERT INTO `job_schedule` (`item_id`, `name`, `type`, `id`, `period`, `crontab`, `data`, `expire`, `created`, `last`, `periodic`, `next`, `scheduled`) VALUES
(1530, 'feeds_source_import', 'psl_xml_importer', 0, 86400, '', NULL, 0, 0, 1355476741, 1, 1355563141, 0),
(1532, 'feeds_source_import', 'news_xml_importer', 0, 0, '', NULL, 0, 0, 1355476801, 1, 1355476801, 0);
#19
This is related to the hardcoded time parameter in the queue_info callback - changing the default 15 seconds to something higher (240 in my case) solves this problem.
Lines 85-104 of feeds.module:
/*** Implements hook_cron_queue_info().
*/
function feeds_cron_queue_info() {
$queues = array();
$queues['feeds_source_import'] = array(
'worker callback' => 'feeds_source_import',
'time' => 15,
);
$queues['feeds_source_clear'] = array(
'worker callback' => 'feeds_source_clear',
'time' => 15,
);
$queues['feeds_importer_expire'] = array(
'worker callback' => 'feeds_importer_expire',
'time' => 15,
);
$queues['feeds_push_unsubscribe'] = array(
'worker callback' => 'feeds_push_unsubscribe',
'time' => 15,
);
return $queues;
}
the hook_cron_queue_info adds entries to Drupal's queue table - and processes the queue for "up to" the time period specified - 15 seconds in the default.
If you have alot of large feed sources to import on each scheduled run (like i did), this will usually only get through one or two feed sources before hitting that limit.
You'll end up with a lot of items stuck in the queue table (Mine had 70,000+ rows of duplicate entries. That's because every scheduled run, all your feed sources get added to the queue, but only one or two get processed.
If you increase your time limit to something large, and have lots of duplicate rows in your queue table, the next run will process as many of those records as it can - so I'd suggest blanking out the queue table first (make a backup of it in case you need to restore something).