Project:Feeds
Version:7.x-2.x-dev
Component:Code
Category:feature request
Priority:normal
Assigned:Unassigned
Status:needs review

Issue Summary

The user should be able to create a feed from the command line and refresh it.

Comments

#1

Version:<none>» 6.x-1.0-alpha11
Status:active» needs review

I created some drush commands for the feeds module. (The file is coded for Drush 3.)

The commands available are:

drush feeds-config
* Displays all active importers or displays the config of a given importer (passed as arg).

drush feeds-refresh
* Refreshes a feed based on its schedule.

drush feeds-queue
* Adds a scheduled feed to the drupal_queue. (Needs to be run in conjunction with "drush queue cron".)

However, the feature needed above is not satisfied yet.

AttachmentSize
feeds.drush_.inc_.txt 6.57 KB

#2

Status:needs review» needs work

I gave this a quick test, it looks like the schema has changed. (last_scheduled_time -> last_executed_time)

Also running drush feeds-refresh gave me the following error:

Exception: Empty configuration identifier. in sites/all/modules/feeds/includes/FeedsConfigurable.inc on line 56

Marking code needs work, until I can give this further testing / work.

#3

Version:6.x-1.0-alpha11» 6.x-1.0-alpha12
Status:needs work» needs review

Here's an updated version for alpha12, where the schema changed happened. I haven't tested this yet for alpha14. Let me know how it goes.

AttachmentSize
feeds.drush_.inc_.txt 6.67 KB

#4

Version:6.x-1.0-alpha12» 6.x-1.x-dev
Status:needs review» needs work

I do not understand the intention of the patch in #3. Let me write out what it does to make sure we're on the same page:

1. drush feeds-refresh [importer_id] imports all feed subscriptions of a specific importer id IF they are up for a scheduled import.
2. drush feeds-queue [importer_id] queues all feed subscriptions of a specific importer id for import IF they are up for a scheduled import.

Right?

What I do not understand is:

Why do you not get away with writing a very simple wrapper for invoking feeds_scheduler()->cron(); ? This is essentially what's invoked on feeds_cron(). The logic of whether something is being queued or worked off immediately is present in FeedsScheduler::cron().

#5

ivanbueno: I knew we've started this discussion already #717390-12: What's the best way to refresh feeds faster with the Feeds module? - still my questions in #4 remain the same.

#6

alex_b: I did not use FeedsScheduler::cron() because it will act upon scheduled imports of all importer id's. I don't see a way to target a specific importer id. Let me know of the alternative or if I'm missing something.

#7

"because it will act upon scheduled imports of all importer id's"

Why is this a problem?

#8

What's executing the import for a particular importer is a daemon script that runs every 15 seconds. (We're not using cron to import the feeds.) When that importer runs, we don't want any other importers to run that might slow the process. The shell script has more control on which importer to run based on the importance, load, and nature of the importer.

On a side note, pubsubhubbub would have solved our problems, except we're not dealing with rss/atom feeds. It's where we want to go though.

#9

When that importer runs, we don't want any other importers to run that might slow the process.

I see. I'll assume now that these other importer's impact is not negligable. Here is how your use case needs to be addressed:

1) Add optional parameter importer_id to FeedsScheduler::cron(). If given, only work off subscriptions of feeds using importer_id.
2) Write a drush feeds-cron command that accepts an optional importer_id string that is passed on to FeedsScheduler::cron().
3) Optionally, introduce a Drupal variable that controls whether feeds_cron() invokes FeedsScheduler::cron() or not. This could allow you to shut off cron scheduling on cron.php altogether and move it to a separate process triggered by drush.

1) and 3) are a different issue than this one, 2) could be part of this issue.

On a side note, pubsubhubbub would have solved our problems, except we're not dealing with rss/atom feeds. It's where we want to go though.

What feeds are you dealing with? Pubsubhubbub is moving towards a content-type independent specification.

#10

Ok, 1, 2, and 3, works for my setup. I'll write a patch for 1 and 2. For #3, I'm not sure yet how to best handle that. (Additionally, I think the drush import process should use its own semaphore, instead of variable feeds_scheduler_cron... for cases where people have a drush script and cron running simultaneously.)

What feeds are you dealing with? Pubsubhubbub is moving towards a content-type independent specification

We're working with NewsML files (an xml standard for news press releases), where each post entry is an xml file. Content-type independent for pubsubhub would be great!

#11

Status:needs work» needs review

Please review this patch on feeds_scheduler::cron(). It's modified to cron($feed_name = NULL, $semaphore_type = NULL, $bypass_queue = FALSE).

$feed_name is added so cron can run a specific importer.
$semaphore_type is added so drush or any other caller can run their own cycles.
$bypass_queue is added so drush has an option to bypass cron-queue even if drupal_queue is enabled.

The drush file has been updated, and much cleaner:

% drush feeds-cron
Refreshes a feed right away. Uses feeds_scheduler::cron();

% drush feeds-queue
Adds a feed to queue. Nees "drush queue-cron" needs to be run afterwards. Also uses Uses feeds_scheduler::cron();

% drush feeds-config
Get all active importers for a site, or get details about an importer.

TODO:
Adding $bypass_drupal_cron for setting an importer to be ignored by drupal cron.

TESTED ON:
* Ran cron.php without drupal_queue
* Ran cron.php with drupal_queue, then drush queue-cron
* Ran drush feeds-cron
* Ran drush feeds-queue, then drush queue-cron

Thanks. Let me know if I'm missing anything.

AttachmentSize
feeds.drush_.inc_.txt 3.88 KB
feeds-cron-alpha14.patch 3.16 KB

#12

Status:needs review» needs work

- Why have an option to bypass the queue? If queue should not be used, drupal_queue module can be shut off.
- $feed_name should be $importer_id
- $semaphore_type can break the scheduler if callers use different semaphore types but work on the same importers. Let's not add different types of semaphores at all. This is what the queue is for.

In general, I'd love to keep it simple. Blowing up options on this sort of functionality is asking for trouble...

#13

I'm ok with importer_id and removal of semaphore_type. I need bypass_queue because we have other modules dependent on the drupal_queue module. We can't just shut it off to disable the feeds-queue.

#14

Status:needs work» needs review

here are the updated patches:

* $feed_name renamed to $importer_id
* $semaphore_type is removed.

I hope $bypass_queue stays. I need it.

Let me know if this works or not.

AttachmentSize
feeds-cron-alpha14.patch 2.6 KB
feeds.drush_.inc_.txt 3.91 KB

#15

I need bypass_queue because we have other modules dependent on the drupal_queue module. We can't just shut it off to disable the feeds-queue.

Hm. I guess what I'd like to know is why do you need to *not* use it? drush feeds-cron followed by drush queue-cron should do the job, doesn't it?

BTW, FeedsScheduler can be extended and the extending class can be injected by setting the feeds_scheduler_class variable (see FeedsScheduler::instance()).

It almost sounds like you have such a special use case that overriding FeedsScheduler to accomodate it makes a lot of sense...

#16

Hm. I guess what I'd like to know is why do you need to *not* use it? drush feeds-cron followed by drush queue-cron should do the job, doesn't it?

We need "drush feeds-cron" to be as atomic as possible, without any other processes that might slow it down. "drush queue-cron" might come with other scheduled jobs that could potentially slow down the refresh.

If there's no $bypass_queue, "drush feeds-cron" is not needed. "drush feeds-queue" would suffice because "drush queue-cron" is needed to be run anyways.

I don't think this is a special use case. It applies for anyone who will be using the drush commands because of performance/tight-scheduling reasons. Having "drush feeds-cron" and "drush feeds-queue" gives the shell script more options and control.

#17

Component:User interface» Code
Status:needs review» needs work

I don't plan to commit an option to bypass the queue - at least not in the suggested implementation. We should add a drush cron command that updates a single feed through the command line though - but not by calling through FeedsScheduler, but by calling FeedsSource::import() directly (see import menu callbacks).

#18

Status:needs work» needs review

here's a simplified drush for refreshing feeds. please review.

AttachmentSize
feeds.drush_.inc_.txt 4.06 KB

#19

I needed something like this.

This is a bit different than the other patches on this thread.

This has 3 drush commands feeds-list, feeds-import, and feeds-clear.

Feeds-list is a table display that displays all the feeds and their options such as name,description attached to, status, and state.

Feeds-import takes in the feed_name and does an import ignoring the schedular, also has the nid options if needs.

Feeds-clear is the same as import except it clears.

Sorry for being lazy and not dealing with fake cvsadd :-(

AttachmentSize
feeds.drush_.inc_.txt 3.46 KB

#20

suscribe

#21

Here's an update to my previous patch.

This time around it provides output to the screen of the batch process.

AttachmentSize
feeds.drush_.inc_.txt 3.74 KB

#22

subscribe

#23

I've tested the file and works fine for me, Drush is drupal version agnostic so this patch could be applied both to D6 and D7 branches.

Here is the patch attached with git format and I've also added support for importing a file from drush into feeds (option file).

I don't like the way this patch import the feeds calling import() method directly, I think we should code it using batch api, see http://drupal.org/node/873132 I'll give it a try in a while but meantime this is fucntional enough.

AttachmentSize
608408-feeds_drush_integration-23.patch 4.45 KB

#24

@pcambra, feeds already does the batch processing. Also see alex_b comment on #17.

#25

@ericduran, but why not use batch api with drush calling feeds_batch, I'll give it a try so it is more clear what I mean.

#26

@pcambra, I believe that the import will call feeds_batch itself. I might be wrong but that was my understanding.

#27

I don't think that import() method uses batch at all that's why you need to loop over it, here is a patch using drush and batch, also adds commands for deleting and reverting feeds and also enable/disable. When this gets into drush #1186480: Make the batch api benefit of backend show progress automatically. user will get better messages when doing batch operations.

AttachmentSize
608408-feeds_drush_integration-27.patch 9.36 KB

#28

Status:needs review» needs work

Wow, perfect timing. Thank you very much for this.

Bug: while the --nid option works great, I can't get the --file option to work.

I tried:
drush feeds-import res_core_user --file=sites/default/files/feeds/department.csv

resulting in the error message:
File sites/default/files/feeds/dept-commas-newemail_2.csv is not accessible.

Feeds is remembering a file name I uploaded previously, and doesn't seem to be accepting the new --file option. I also tried an absolute path. (and tried putting it in quotes...)

The workaround for me is to manually create an import node, which remembers the correct filename, and pass --nid=## to this drush command.

#29

Status:needs work» needs review

Ok, I see why now, here is the error fixed for #28, but as we don't really upload a file, we need to fake it, so if the file already exists in public://feeds, feeds-import will respect the older file, and if it doesn't, it will create it.
Probably it would make sense to check the file extension according to the feed config.

AttachmentSize
608408-feeds_drush_integration-29.patch 11.02 KB

#30

Status:needs review» needs work

Discussing the subject with jonhattan, it seems that we'd need to improve the argument and options validation with a validate function to be more robust.

Also it may appear that these two lines might not be required for drush to handle batch.

<?php
$batch
=& batch_get();
$batch['progressive'] = TRUE;
?>

#31

Status:needs work» needs review

Another version of the patch that checks the file extension and removes unnecessary lines for drush to process batch.

AttachmentSize
608408-feeds_drush_integration-31.patch 11.35 KB

#32

sub!

#33

sub

#34

+1

#35

#31 works, but there's no ability to see error log of import - drush finished silent

EDIT: errors are printed to stdout, Tested on D7!

#36

downloaded the patch from #31 and used it with an installation of Feeds 6.x-1.0-beta11 - works great!

we use the standalone form for importing - i added a "url" option to feeds-import and added this code to process it:

  elseif ($url = drush_get_option('url')) {
    $config = $feedsSource->getConfig();
    $config['FeedsHTTPFetcher']['source'] = $url;
    $feedsSource->setConfig($config);
    $feedsSource->save();
  }

seems to work OK.

i expect that needs some kind of validation as well, plus i wasn't sure about the naming conventions - if the argument should be "uri" not "url" - but though this might be of use to someone

#37

When trying #31 I received the output:

Only files with the following extensions are allowed: .

When putting print_r($fetcher_config); on line 149 of feeds.drush.inc the output is:
Array
(
[direct] => 1
)
Only files with the following extensions are allowed: .

The extension is looking for $fetcher_config['allowed_extensions'], which doesn't exist in the $fetcher_config array, and which I can't find on the feeds config UI (outside of setting the parser to CSV file).

In any case, the --nid= option continues to work, but things would be a little more slick if I could get rid of these content types that exist only for importing, and instead use the --file option. Thanks for this improvement to feeds, as it makes a really important part of my project possible (regular data import triggered by cron, from a file).

#38

#37 you configure the allowed extensions in your feeds importer configuration, in the file upload settings, but maybe the patch can be improved to allow all the extensions if none is set.

#39

Hi pcambra, thanks a lot for getting back to me. If I should open a new issue, please let me know, but perhaps the settings you're referring to are in D7? I'm on D6 (and this issue is tagged D6), and I've attached my file upload settings, using the latest feeds 6.x-1.0-beta11. There doesn't appear to be any place to configure allowed file extensions, or am I missing something?

AttachmentSize
feeds-profile-settings.png 48.06 KB

#40

Version:6.x-1.x-dev» 7.x-2.x-dev

The patch in #31 looks like D7 code to me, for example:

<?php
$feed_dir
= 'public://feeds';
file_prepare_directory($feed_dir, FILE_CREATE_DIRECTORY | FILE_MODIFY_PERMISSIONS);
?>

file_prepare_directory() is a Drupal 7 function.

#41

Here's a version of #31 that works for feeds 6.x, and bails with an error if the user attempts to import a file which doesn't exist.

AttachmentSize
608408-feeds_drush_integration-41.patch 10.09 KB

#42

Hello,

I have added the functionality to import by http, and not only by a local file.

AttachmentSize
feeds.drush_.inc_.txt 10.04 KB

#43

The added option to import via --http= parameter in #42 works for me on Drupal 6. Attached is the file from #42 in patch form.

AttachmentSize
feeds_drush_integration-608408-42.patch 10.52 KB

#44

I've done some testing with this on Drupal 6 and have some questions. Here is my scenario:

---

* Nodes initially created via Import button on feed node.

Adding context and fields to the XPath XML parser settings on a feed importer and both detaching it from and leaving it attached to the content type that also contains the context and fields seems to creates duplicate nodes upon first import via drush with --http= option. Subsequent drush imports do not create duplicate nodes, but update the drush-created nodes. Something triggers new node creation when the importer changes that significantly.

* Users initially created via Import button on feed node.

Adding context and fields to the XPath XML parser settings on a feed importer and both detaching it from and leaving it attached to the content type that also contains the context and fields does not duplicate users upon first import via drush with --http= option.

---

Can anyone explain why drush feeds-import will not update existing nodes that were created with the import button on a feed node? I don't think it has to do with using Feeds XPath Parser, but could it? It's not the end of the world if I have to delete my already created nodes and start over. It is strange though that users are not affected in the same way.

nobody click here