Hello
Thanks for this great module.
I have a question : i created a fetcher that extends FeedsFileFetcher in order that i can configure my "directory path" in fetcher's form rather than in the standalone form. It's the only solution i found to import periodically my xml files.
When i'm doing manually (standalone form) the import, all is fine : my 40 xml files are fetched, parsed and converted to nodes.
But when i set "Periodic import" in "Basic settings" to "as often as possible"; then only ONE file is imported each time the cron is fired. I would like to parsed all the xml contained in my folder each time ...
Why only one xml file is imported ? how to change this behavior ?
Thanks
Comment | File | Size | Author |
---|---|---|---|
#28 | feeds-queue-1231332-28.patch | 3.51 KB | twistor |
#27 | feeds-queue-1231332-27.patch | 3.52 KB | klausi |
Comments
Comment #1
nyl_auster CreditAttribution: nyl_auster commentedPS : my fetcher's code is exactly the same as FeedsFileFetcher for now
Comment #2
nyl_auster CreditAttribution: nyl_auster commentednobody has a clue for me ?
Comment #3
Cajun CreditAttribution: Cajun commentedhey did you figure this out? I got the exact same problem
Comment #4
juhaniemi CreditAttribution: juhaniemi commentedConfirming this issue and setting it as a bug.
Comment #5
janfang CreditAttribution: janfang commentedI have the same problem. Have you found a solution?
Comment #6
valderama CreditAttribution: valderama commentedSeems like we are having the same problem here. Any clues someone?
Thanks,
walter
Comment #7
surf12 CreditAttribution: surf12 commentedthe same problem. Help us pleace!
Thanks...
Comment #8
siva.thanush CreditAttribution: siva.thanush commentedSame problem persists.
Or this post is duplicate?
For me this happens in the next version also.
Comment #9
siva.thanush CreditAttribution: siva.thanush commentedIts working in the latest version 7.x-2.0-alpha5
I am not sure with the lower one.
Comment #10
franzComment #11
cmarcera CreditAttribution: cmarcera commentedI'm using alpha7 and this bug persists for me. I have cron running every 10 minutes and my periodic import is set to run as often as possible. I'm using the file upload fetcher to parse XML files with the "Supply path to file or directory directly" option checked.
Every 10 minutes, my Feed Importer imports 1 XML file from the directory. After that, the feed is locked and must be unlocked if I want to run it manually. If I run it manually, it imports all of the XML files as expected.
Comment #12
gurrmag CreditAttribution: gurrmag commentedI'm having this issue too...
I'm using feeds to take new files uploaded to the server and import them periodically as nodes of a specific content type, with update existing nodes selected. However, when jobs scheduler fires, only one node is processed at a time, and, when the import is complete, it reports that 100's of nodes have been imported, i.e. the total of all nodes imported, rather than just the ten or so new files that are available daily. None of these files are particularly long - many of them are just one paragraph of text.
I have minimised this issue by running jobs scheduler every five minutes - fortunately feeds is the only thing using this.
Comment #13
cmarcera CreditAttribution: cmarcera commentedMy issue will import 1 item, then lock the feed saying it's XX% done. After the next cron run, it imports another item and increase the percentage done. It's baffling because the percentage clearly knows how many files are in the directory to process, it's just stopping after one.
Comment #14
cmarcera CreditAttribution: cmarcera commentedI've now tried various settings in my Feeds importer and none seem to import more than a single item.
gurrmag, what settings are you using? Trying to find a common denominator.
Comment #15
gurrmag CreditAttribution: gurrmag commentedMy settings are:
Basic settings
• Attached to: [none]
• Periodic import: 1 day
• Import on submission: Checked
• Process in background: Unchecked
Fetcher
• File upload: Upload content from a local file.
Parser
• XPath XML parser: Parse XML using XPath.
Processor
• Node processor: Create and update nodes.
I've tried a number of different combinations, but haven't found a culprit for this yet either...
Comment #16
dgtlmoon CreditAttribution: dgtlmoon commentedwhat does the content of your job_schedule table look like?
Comment #17
cmarcera CreditAttribution: cmarcera commentedComment #18
gurrmag CreditAttribution: gurrmag commented-- Table structure for table `job_schedule`
--
CREATE TABLE IF NOT EXISTS `job_schedule` (
`item_id` int(10) unsigned NOT NULL AUTO_INCREMENT COMMENT 'Primary Key: Unique item ID.',
`name` varchar(128) NOT NULL DEFAULT '' COMMENT 'Name of the schedule.',
`type` varchar(128) NOT NULL DEFAULT '' COMMENT 'Type identifier of the job.',
`id` int(10) unsigned NOT NULL DEFAULT '0' COMMENT 'Numeric identifier of the job.',
`period` int(10) unsigned NOT NULL DEFAULT '0' COMMENT 'Time period after which job is to be executed.',
`crontab` varchar(255) NOT NULL DEFAULT '' COMMENT 'Crontab line in *NIX format.',
`data` longblob COMMENT 'The arbitrary data for the item.',
`expire` int(11) NOT NULL DEFAULT '0' COMMENT 'Timestamp when job expires.',
`created` int(11) NOT NULL DEFAULT '0' COMMENT 'Timestamp when the item was created.',
`last` int(10) unsigned NOT NULL DEFAULT '0' COMMENT 'Timestamp when a job was last executed.',
`periodic` smallint(5) unsigned NOT NULL DEFAULT '0' COMMENT 'If true job will be automatically rescheduled.',
`next` int(10) unsigned NOT NULL DEFAULT '0' COMMENT 'Timestamp when a job is to be executed (next = last + period), used for fast ordering.',
`scheduled` int(10) unsigned NOT NULL DEFAULT '0' COMMENT 'Timestamp when a job was scheduled. 0 if a job is currently not scheduled.',
PRIMARY KEY (`item_id`),
KEY `name_type_id` (`name`,`type`,`id`),
KEY `name_type` (`name`,`type`),
KEY `next` (`next`),
KEY `scheduled` (`scheduled`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='Schedule of jobs to be executed.' AUTO_INCREMENT=1533 ;
--
-- Dumping data for table `job_schedule`
--
INSERT INTO `job_schedule` (`item_id`, `name`, `type`, `id`, `period`, `crontab`, `data`, `expire`, `created`, `last`, `periodic`, `next`, `scheduled`) VALUES
(1530, 'feeds_source_import', 'psl_xml_importer', 0, 86400, '', NULL, 0, 0, 1355476741, 1, 1355563141, 0),
(1532, 'feeds_source_import', 'news_xml_importer', 0, 0, '', NULL, 0, 0, 1355476801, 1, 1355476801, 0);
Comment #19
DannyPfeiffer CreditAttribution: DannyPfeiffer commentedThis is related to the hardcoded time parameter in the queue_info callback - changing the default 15 seconds to something higher (240 in my case) solves this problem.
Lines 85-104 of feeds.module:
the hook_cron_queue_info adds entries to Drupal's queue table - and processes the queue for "up to" the time period specified - 15 seconds in the default.
If you have alot of large feed sources to import on each scheduled run (like i did), this will usually only get through one or two feed sources before hitting that limit.
You'll end up with a lot of items stuck in the queue table (Mine had 70,000+ rows of duplicate entries. That's because every scheduled run, all your feed sources get added to the queue, but only one or two get processed.
If you increase your time limit to something large, and have lots of duplicate rows in your queue table, the next run will process as many of those records as it can - so I'd suggest blanking out the queue table first (make a backup of it in case you need to restore something).
Comment #20
queenvictoria CreditAttribution: queenvictoria commentedI also had the issue in #19. After chasing around all over the place setting timeouts in nginx, php, http_request_timeout in settings file, I've settled on drush calling feeds import. I've added some work over here to aid in this task.
http://drupal.org/node/608408
My table had 600k feeds imports queued. Nice tip for clearing the table. Good idea to back up first. This op took 5 minutes.
mysql> delete from queue where name = "feeds_source_import";
Comment #21
klausiHere is a patch that increases the default run time for the feeds import queue to 60 seconds, which is the same as core's aggregator module uses.
I also modified feeds_source_import() to re-queue itself immediately if importing of a feed has not finished. That allows us to process more items of one particular feed during a cron run for example.
Comment #23
klausiFixed the test case, since now it is not possible to determine how many items have been processed in one cron run.
Comment #24
twistor CreditAttribution: twistor commentedNifty. Assigning to myself so I can review it at a normal hour.
Overall, I like the idea.
Making cron non-deterministic is a bit scary. We already have a bunch of problems with it. That said, this would solve a lot of problems with people's expectations. I don't think this will affect sites with a large number of feeds. Well, the re-queuing part won't, but increasing the time limit obviously will.
I kind of like the idea to use the Queue directly, in this case, rather than JobScheduler. There are a couple more places we could do this as well: clearing and expiring.
Could we move the logic back into FeedsSource::scheduleImport()?
Comment #25
lwalley CreditAttribution: lwalley commentedI've been running into the same issue described in #19, with 70,000+ queue entries and I'm wondering if Job Scheduler might be able to help prevent these duplicate jobs. I've added my thoughts to this ticket: #2061647: Rescheduling 'stuck' periodic jobs results in duplicate queue entries?
Comment #26
lalit774 CreditAttribution: lalit774 commentedI have done by following method. so we don't need to hack feeds module.
Comment #27
klausiPatch does not apply anymore, rerolled. I moved the queuing to scheduleImport() as suggested by twistor.
Comment #28
twistor CreditAttribution: twistor commentedApologizes, this fell off my radar. I really like this patch, just trying to flatten out the logic.
Comment #29
twistor CreditAttribution: twistor commentedThanks everybody, especially klausi for coming up with a clever fix.
If somebody wants to try and backport this, they are more than welcome to. But, queue usage in D6 is optional which complicates this a bit.
http://drupalcode.org/project/feeds.git/commit/83f1a1d