Posted by ivanbueno on February 17, 2010 at 8:30pm
3 followers
| Project: | Feeds |
| Version: | 6.x-1.0-alpha11 |
| Component: | Miscellaneous |
| Category: | support request |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | closed (fixed) |
Issue Summary
I'm looking to refresh my feeds faster than cron could, around every 10 seconds. With feedapi, this is possible with drush and a shell script: http://developmentseed.org/blog/2009/jun/24/feedapi-and-drush-refresh-yo... .
What's the best way to make the Feeds module refresh faster? Is it the similar setup done with feedapi? (If so, then I'll probably end up implementing a drush hook for Feeds that I can use. I just want to make sure this is the correct way, or if there's an alternative.)
Thanks!
Comments
#1
Use http://drupal.org/project/drupal_queue
Closed?
#2
Yes, thanks!
#3
Do I need to write a new FeedsScheduler that will add the feed to the queue table faster than cron() could?
Correct me if I'm wrong, here's how I see Feeds + Drupal Queue works:
* cron() adds feeds_queue to Drupal Queue
* Drupal Queue will remove feeds_queue once it is "drush queue cron" is ran.
If I want to refresh a particular Feed Importer every 10 seconds, how do I add it to Drupal Queue faster than cron() hook could? Do I need to, is there another way?
#4
I created some drush commands for the feeds module. (The file is coded for Drush 3.)
The commands available are:
drush feeds-config
* Displays all active importers or displays the config of a given importer (passed as arg).
drush feeds-refresh
* Refreshes a feed based on its schedule.
drush feeds-queue
* Adds a scheduled feed to the drupal_queue. (Needs to be run in conjunction with "drush queue cron".)
I will work on the shell script that will execute these drush commands.
#5
Exactly.
10 seconds! Using dedicated drush commands (like you posted in #4) is probably the way to go. Out of curiosity, what are you importing 6 times in a minute?
Could you post #4 on #608408: Drush integration ?
#6
I'm fetching NewsML (an xml standard for multimedia news) files from a server via SSH. If a newsml file popups up in the server, the feeds module has to fetch, parse, and process it RIGHT AWAY in the drupal site (in less than 60 seconds).
For this, I had to create these plugins:
* SSHFetcher
* NewsMLParser
* NewsMLProcessor
As a side note, the NewsML has a very different structure than RSS/syndication xmls. That's why the FetcherResult object in Alpha9 suited me really well; it has no specified structure. With FeedsImportBatch, the only variable I'm using is the $items. Its getRaw() does not quite work for me because I'm fetching it via SSH not http.
#7
"Its getRaw() does not quite work for me because I'm fetching it via SSH not http.
Did you try to override it in your own fetcher then? Compare http://drupalcode.org/viewvc/drupal/contributions/modules/feeds/plugins/... to http://drupalcode.org/viewvc/drupal/contributions/modules/feeds/plugins/... ...
#8
The logic was inside fetch()... creating an SSHBatchImport is much better, and I can still ensure that all SSH connections only happen in the fetch phase.
Thanks for the tip!!
#9
I just had the time to review the drush support in #4. While at first I asked you to post it over on the drush integration issue I now realize that there may be some fundamental misunderstanding on how feeds/drupal queue work together. I'd like to explore your use case a little better - that's why I am posting back here.
I see a lot of duplicated code and direct calls to feeds_scheduler_work() - which should be a strict callback invoked from drupal_queue_cron_run().
I wonder what made you duplicate specifically this functionality. What problems did you face that made you break out queuing and refreshing in this way?
#10
My use case calls for high availability, and the news feeds need to be parsed instantly. I had to create a feeds-refresh command outside of the drupal_queue to avoid cases where there's a lot of other jobs in the queue, which might slow down the news feed parsing for this particular importer.
If the queue has a mechanism that will push my high-priority job to the top, then I would use that. Right now, it's easier to create a dedicated process to handle the importing.
Additional question: when "drush queue cron" is run, does it fire-off the callbacks sequentially or in parallel? How does it determine which to run first?
#11
Right, but do you actually need that availability for many feeds or just for very few? Because if you don't have many feeds you don't need the queue.
drush queue cron fires of 1 process pulling on item after the other from the queue. You have to dispatch multiple drush queue cron commands to run multiple processes.
#12
It's not high volume. On average, about 20 items every 10 seconds. Plus, I'm capping the fetcher to only get 100 items per cycle.
What's the benchmark on the Feeds module on how much feed items it can fetch, parse, process without timing out or breaking down? With my testing, the feeds module was able to process 250 feed items (24kb each) in under 40 seconds. So I'm assuming 100 items per cycle is still safe.
Thanks for the drush queue explanation. So it is possible to run parallel executions.
#13
Closing this support request in favor of #608408: Drush integration.
See #754626: Performance - max number of feeds and documentation for more info on performance.
#14
Automatically closed -- issue fixed for 2 weeks with no activity.