OPML Import for Aggregator

arnabdotorg - January 27, 2005 - 06:13
Project:Drupal
Version:7.x-dev
Component:aggregator.module
Category:feature request
Priority:normal
Assigned:Arancaytar
Status:closed
Description

The aggregator supports OPML export for all feeds, but not import. This patch enables import of OPML lists, allowing people to import lists of RSS feeds they've already assembled in another RSS app.

Demo: http://arnab.drupaldevs.org/aggregator

Patch: http://arnab.drupaldevs.org/tmp/import_opml.patch

AttachmentSizeStatusTest resultOperations
import_opml.patch3.96 KBIgnoredNoneNone

#1

Dries - January 27, 2005 - 21:22

1. This patch does not conform Drupal's coding style at all.

2. The error handling of aggregator_import() is bare bones (not helpful).

3. Use drupal_xml_parser_create() instead of xml_parser_create().

4. Drop the "sorry"'s.

5. form_set_error() is used incorrectly.

Please refine and resubmit.

#2

Dries - March 3, 2005 - 21:20

#3

Matt_T_hat - May 6, 2005 - 19:40
Category:feature request» bug report
Priority:normal» critical

No doubt I've just irritated some one but after spending the day finding out how to run diff/patch on win32 and getting that done - guess what!

That's right the form for OPML is tantalisingly there and yet it is beyond me to get it to anything (like submit). The browse button gets the page to hang for a few moments but that is all.

#4

robertDouglass - May 11, 2005 - 19:05

Would be nice if someone could revive this code.

#5

Matt_T_hat - May 11, 2005 - 22:16

A-Ha! I got it to work.

The OPML list must have been downloaded and can then be uploaded (via browse) and then it will work. Very well.

#6

arnabdotorg - May 25, 2005 - 10:04
Assigned to:arnabdotorg» Anonymous

Too lazy|busy to handle this, it would be nice to see this come through.

#7

Robin Monks - May 26, 2005 - 11:47
Assigned to:Anonymous» Robin Monks

I'm cleaning this up now, here is what I've got so far:
- Now Uses drupal_xml_phraser_create
- Applies Aganist HEAD
- Some Sorry's removed.

To be done:
- Conform to coding style
- Don't give error on existing entry
- Fix form error code

Robin

AttachmentSizeStatusTest resultOperations
OPML.import.patch3.25 KBIgnoredNoneNone

#8

Robin Monks - June 8, 2005 - 23:02
Category:bug report» feature request
Priority:critical» normal

I haven't given up, it's just taking a while.

Robin

#9

Uwe Hermann - August 20, 2005 - 02:27
Status:active» needs review

I have rerolled the patch, it applies more cleanly to HEAD now. Also, I fixed a bug (I think), $feed['TITLE'] should have been $feed['TEXT'], right? The patch as is works for me. Please lets try to get this into 4.7.

#10

Uwe Hermann - August 20, 2005 - 02:28

Forgot the patch.

AttachmentSizeStatusTest resultOperations
OPML.import_0.patch3.35 KBIgnoredNoneNone

#11

Robin Monks - August 22, 2005 - 22:45
Assigned to:Robin Monks» Anonymous

Please make sure this patch checks that the site is not already listed in the .

Also, please reassign this patch to yourself.

Robin

#12

Uwe Hermann - August 23, 2005 - 21:45
Assigned to:Anonymous» Uwe Hermann

"not listed in the ......"? What is missing?

#13

Prometheus6 - September 10, 2005 - 19:06

I think he's talking about this:

+        //Adding '@' to bypass the problem of duplicate INSERTS
+        @aggregator_save_feed($edit);

#14

drumm - July 11, 2006 - 08:35
Status:needs review» needs work

No longer applies.

#15

dayre - September 27, 2006 - 01:56

Not a lot of activity on this lately. I'm looking for OPML import functionality for a project i'm working on. The state was changed to "patch (code needs work)" with the last statement by drumm being "No longer applies.".

What no longer applies ? What work still needs to be done with this ? If Uwe doesn't have time, i might as well pick this up, a small TODO list would be helpful too.

#16

Boris Mann - September 27, 2006 - 03:03

David -- this means the patch no longer applies to HEAD. Generally, we prepare patches for that, and then "backport" to the current release. This is a feature, so it may no longer get into 5.0, but shouldn't be too many variants.

The "To Do's" seem to be:
* Conform to coding style
* Don't give error on existing entry
* Fix form error code

#17

Egon Bianchet - October 3, 2006 - 07:39

Subscribing ...

#18

Egon Bianchet - January 17, 2007 - 19:35
Version:x.y.z» 6.x-dev

Maybe in Drupal 6? ;-)

#19

budda - January 27, 2007 - 19:50

Thanks to the original patch XML processing code, I've just worked OPML import/export functionality in to FeedParser for 4.7.

#20

Pancho - February 4, 2008 - 19:45

Move this to the D7 queue.

#21

Arancaytar - February 5, 2008 - 12:21

Ugh. This is 4.6.x code - form_file(), form_submit(), etc. It will be fun to get it up to date.

#22

Arancaytar - February 5, 2008 - 13:36
Version:6.x-dev» 7.x-dev
Assigned to:Uwe Hermann» Arancaytar
Status:needs work» needs review

... and by fun, I apparently mean easy. Seriously, that was all?

I'm pretty sure that the XML parsing is not up to Drupal standards, however.

The attached prototype patch allows both a local file upload and a remote URL (to be downloaded only once). It does not properly validate, however, and it also does not protect against duplicates.

It works like a charm, though - I imported ~400 feeds from a single OPML file to test. It uses drupal_execute to programmatically submit the feed addition form for each feed in the file.

TODO:
- Protect against those duplicate key inserts by checking for duplicate items.
- Validate the file, make sure that at least one of the two (upload or remote URL) are entered
- Make sure that the XML parsing does not go against ordinary Drupal style

AttachmentSizeStatusTest resultOperations
aggregator-opml-import-16282-22.patch5.27 KBIgnoredNoneNone

#23

Arancaytar - February 8, 2008 - 11:34

The new patch checks for duplicates before adding.

AttachmentSizeStatusTest resultOperations
aggregator-opml-import-16282-23.patch6.03 KBIgnoredNoneNone

#24

Robin Monks - March 4, 2008 - 00:16

Aside from the dupe checking, this worked great!

Robin

#25

Robin Monks - March 4, 2008 - 00:17

Please note, in the previous post I tested Patch #22.

Robin

#26

Arancaytar - March 4, 2008 - 15:13

Mh... could you also test #23, then? :)

#27

Arancaytar - June 23, 2008 - 21:53

Here is a new patch.

AttachmentSizeStatusTest resultOperations
aggregator-opml-import-16282-27.patch6.34 KBIgnoredNoneNone

#28

Arancaytar - June 23, 2008 - 21:56

Forgot to add: The patch renames some variables, validates that the user either enters a URL or uploads a file, and improves error handling. An OPML outline can be valid without containing anx XML feeds, which will be reflected in the messages.

I have been able to add 208 feeds in one go with this patch in HEAD.

#29

Alex UA - June 24, 2008 - 20:44

I tried this patch on my local machine (XP runnign XAMP) and I received the following error:

* notice: Undefined index: aggregator_form_opml in C:\xampp\htdocs\drupal7\includes\form.inc on line 345.
* warning: call_user_func_array() [function.call-user-func-array]: First argument is expected to be a valid callback, 'aggregator_form_opml' was given in C:\xampp\htdocs\drupal7\includes\form.inc on line 360.

#30

Arancaytar - June 24, 2008 - 21:05

In Drupal 7, you need to rebuild the code registry if a new form is added. Disable and re-enable the aggregator module.

#31

Alex UA - June 24, 2008 - 21:13

That worked, thanks! This is a really great feature- I hope it makes it into this release...

#32

Dries - June 25, 2008 - 07:58
Status:needs review» needs work

I think this looks good. I've tested the patch and managed to import ~200 feeds in one swoop. Very nice.

A couple of requests though:

- There are a couple of coding style violations but nothing I can't fix when committing this patch. I'd recommend that we run it through the coder module.

- It would be useful if you could add a one paragraph context-sensitive help text on ?q=admin/content/aggregator/add/opml. I bet you that most people are not familiar with OPML so I think it is important to clue them in. The help text can be short and to the point.

- I'd really like to see us ship some tests with this. Do you think you can write a couple of SimpleTests? It would make this patch fly straight in. :-)

Thanks!

#33

Arancaytar - June 27, 2008 - 12:37
Status:needs work» needs review

I think I've managed to get all the code-style issues. The D6 coder is unfortunately a bit hit-and-miss there - it has the old concatenation policy.

Also, there's some help text:

OPML is an XML format used to to exchange multiple newsfeeds between aggregators. A single OPML document may contain a collection of many feeds. Drupal can parse such a file and import all feeds at once, saving you the effort of adding them manually. You may either upload a local file from your computer or enter a URL where Drupal can download it.

There are some other changes:
- renamed _aggregator_parse_opml to _aggregator_parse_opml_feeds to clarify that the function specifically extracts newsfeeds (OPML can contain much more data than that)
- doxygen on _aggregator_parse_opml_feeds
- added an "or" between the upload and URL fields to mark them as alternative
- if no categories are created yet, note that all imported feeds could be added to the same category (it's annoying to find this out later, because afterward they have to be classified manually)
- got rid of the fallback method of prepending <?xml version="1.0" ?>. Seems like the parser works without this line now (perhaps PHP5 only? But D7 is PHP5+ anyway).

The only thing remaining are the simpletests. I've never worked with unit tests before, so I'll need to learn a bit.

I assume these tests will be a single OPML test case. I'll open the form, submit it empty, with an uploaded file and with a URL, check that all feeds get added, and make sure that duplicates are removed. Would that be all?

Here's the patch with everything but the tests.

AttachmentSizeStatusTest resultOperations
aggregator-opml-import-16282-33.patch7.99 KBIgnoredNoneNone

#34

keith.smith - June 27, 2008 - 20:24
Status:needs review» needs work

+  $form['upload'] = array(
+    '#type' => 'file',
+    '#title' => t('OPML File'),
+    '#description' => t('Upload an OPML file containing a list of newsfeeds to be imported.'),
+  );

Everywhere else (AFAIK) we just refer to feeds (rather than newsfeeds).

+  $form['remote'] = array(
+    '#type' => 'textfield',
+    '#title' => t('OPML Remote URL'),
+    '#description' => t('Enter the URL of an OPML file on the web. This file will be downloaded only once.'),
+  );

If its a URL it kinda' implies that it exists on the web. Is the second part true or is is downloaded every time this form is submitted?

+
+  $form['refresh'] = array(
+    '#type' => 'select',
+    '#title' => t('Update interval'),
+    '#default_value' => 3600,
+    '#options' => $period,
+    '#description' => t('The length of time between feed updates. (Requires a correctly configured <a href="@cron">cron maintenance task</a>.)', array('@cron' => url('admin/reports/status'))),
+  );

Thanks for following the D6 precedent on the cron maintenance task stuff!

+    $form['category'] = array(
+      '#type' => 'checkboxes',
+      '#title' => t('Categorize news items'),
+      '#options' => $options,
+      '#description' => t('New feed items are automatically filed in the checked categories.'),
+    );

This is the other example of "news items" in one place and "feed items" in another.

+    $form['category'] = array(
+      '#type' => 'item',
+      '#value' => t('If you had created any <a href="@category">categories</a>, you could choose to assign all imported feeds to one of them.', array('@category' => url('admin/content/aggregator/add/category'))),
+    );

I'm uncertain on the tense of this one but it seems like it would be "If you have created any..." or just "Imported feeds can be assigned to categories."

+ * Parses an OPML file

Needs a period at the end.

+ * @return
+ *   An array of feeds, each an associative array with a title and
+ *   a url element, or NULL if the OPML document failed to be parsed.

url -> URL

+      return '<p>' . t('<acronym title="Outline Processor Markup Language">OPML</acronym> is an XML format used to to exchange multiple newsfeeds between aggregators. A single OPML document may contain a collection of many feeds. Drupal can parse such a file and import all feeds at once, saving you the effort of adding them manually. You may either upload a local file from your computer or enter a URL where Drupal can download it.') . '</p>';

I'm not sure we use acronym anywhere else but I like it. There is a "to to" in here. Again, it talks about news feeds rather than just feeds.

Regardless of these minor style issues, I like OPML a lot!

#35

Arancaytar - June 28, 2008 - 06:43
Status:needs work» needs review

I've fixed "newsfeeds" in 1) and the redundancy/ambiguity in 2).

4) merely repeats text from the normal feed addition form, so I suggest either fixing that now, or both after this is added, but in a separate patch either way. That would be a pretty large-scale issue, incidentally:

$ grep 'feed items' -R *|wc -l
23
$ grep 'news items' -R *|wc -l
7

Clarified the description for 5) (it's specifically advising the user to create categories before importing the feeds, because this is the one time that >200 feeds can be mass-categorized) and also added some more detail to the doxygen description besides a period in 6).

7) I'm not sure about 'URL' vs 'url' in the doxygen. Note that we're talking about a literal key in an array, ie $item['url'], not $item['URL']. I've emphasized these literals to make that more clear.

8) "to to" and "newsfeeds" fixed.

AttachmentSizeStatusTest resultOperations
aggregator-opml-import-16282-35.patch8.19 KBIgnoredNoneNone

#36

Dries - June 28, 2008 - 12:49

The only thing remaining are the simpletests. I've never worked with unit tests before, so I'll need to learn a bit. I assume these tests will be a single OPML test case. I'll open the form, submit it empty, with an uploaded file and with a URL, check that all feeds get added, and make sure that duplicates are removed. Would that be all?

That's correct. The goal of the test cases is to make sure that all of the functionality works, and that all of the functionality keeps working going forward. You've written enough tests for it, when they give you that level of confidence. Writing tests is fun -- I really encourage you to give it a try.

#37

keith.smith - June 28, 2008 - 15:01
Status:needs review» needs work

$ grep 'feed items' -R *|wc -l
23
$ grep 'news items' -R *|wc -l
7

Good note, which led me to do the same grep (the last one) without the wc -l

'news items' occurs

  1. once in the CHANGELOG.txt (old notes about changes to historical versions)
  2. in a title in aggregator.admin.inc
  3. in a code comment in aggregator.module
  4. in a title in aggregator.module
  5. in a drupal_set_message in aggregator.module
  6. in a message in aggregator.test
  7. in a message in what was an outdated file on my system (.#aggregator.test.1.3)

I'm happy to roll a patch for 2, 3, 4, 5, and 6, but there really aren't that many instances of this in user-facing text now, once the instances are fixed in this patch. There is also the permission which is now arguably misnamed but that certainly is another issue.

+      '#value' => t('No <a href="@category">feed categories</a> exit. You may want to create one now, because this is the only time all new feeds can be categorized at once.', array('@category' => url('admin/content/aggregator/add/category'))),

Regarding the current patch, exit should be exist. Other than that, everything else looks good on a quick read-through.

#38

Arancaytar - July 1, 2008 - 08:57

I'm not quite up to the kind of empirical coding required to make a new test case, I'm afraid. I've made the tests and managed to get some of them to work. But for some reason, the tests started consistently crashing, and all HTTP requests started returning 0-length responses after the first few succeeded. Here's my test class; perhaps a few errors can be pointed out.

[test class edited out as it is now in the patch below]

#39

Arancaytar - July 1, 2008 - 08:56
Status:needs work» needs review

After hours of work, I'm beginning to get convinced that the fatal error is a fault of simpletest. Here's the patch, and I'm done debugging that test case. :(

AttachmentSizeStatusTest resultOperations
aggregator-opml-import-16282-39.patch15.94 KBIgnoredNoneNone

#40

catch - July 1, 2008 - 21:45

I don't get the fatal error when runing simpletest.

Looks to me like the example OPML files should be created by the patch in /modules/aggregator somewhere instead of in the patch itself (save those /tmp errors).

#41

Arancaytar - July 1, 2008 - 22:40

The content of the sample OPML files are generated dynamically, including random names. Besides, the only reason the file_save_data fails is that I forgot the TRUE in the file_check_directory above, which means the folder is not created.

I still can't get the tests to run, but the file saving errors are gone.

AttachmentSizeStatusTest resultOperations
aggregator-opml-import-16282-41.patch16.01 KBIgnoredNoneNone

#42

catch - July 1, 2008 - 23:03

This takes me down to 8 fails and 1 exception - tests run fine for me still (and no file_save_data errors this time).

First error is

POST to admin/content/aggregator/add/opml, response is 0 bytes.

Likely that's the cause of the others. No immediate ideas why it's failing though.

#43

Arancaytar - July 1, 2008 - 23:10

Interestingly, that is the same part that is failing for me - only it's more than one request. I can only speculate whether the curl timeout is set too low.

#44

Dries - July 2, 2008 - 19:49

For me, the tests come to halt when running them. Is that the behavior you are seeing?

#45

catch - July 2, 2008 - 20:06

Tests run all the way though for me (with above failures) - this on ubuntu.

#46

Arancaytar - July 3, 2008 - 10:04

For me, the tests come to halt when running them. Is that the behavior you are seeing?

Yes, the batch operation fails after about five minutes with a HTTP 500 error. The test does not run through.

If this is a timeout issue, perhaps I can fix it by dividing the test case up into two classes. After all, every class runs through in one batch step.

#47

Dries - July 3, 2008 - 11:59

Arancayter, I don't think a single test should run for 5 minutes. I means something is broken. Time-permitting, I'll try to install xdebug (on MAMP) to see where the time is spent or where we are being blocked. If someone beats me to it, great. I'd really like to see this patch land.

Thanks for your persistence and your willingness to learn SimpleTest.

#48

Arancaytar - July 3, 2008 - 12:33

Well, I started splitting he test case up anyway to narrow down the problem, and found the request that causes the test to crash:

<?php
    $badfields
= t('You must <em>either</em> upload a file or enter a URL.');
   
$badurl    = t('This URL is not valid.');

   
$form = array(
     
'files[upload]' => '',
     
'remote'        => '',
    );

   
$this->drupalPost('admin/content/aggregator/add/opml', $form, t('Import'));
   
$this->assertRaw($badfields, t('Error if no fields are filled.'));
   
$form['files[upload]'] = $this->path . '/goodopml.xml';
?>

Commenting out the POST will let this function finish normally. There are a lot of failures, but the test runs.

I'm wondering if an empty post array may be the culprit. I could submit another non-required field to make sure something is being sent...

Edit: Even after removing this, I'm still getting 0-length curl responses. However, I tend to get those on other tests in core, so it's possible that that's an issue on my server only.

I would really love it if drupal_http_request could take over the request handling in SimpleTest. Perhaps it's because I use PHP as CGI, but curl is pretty unusable on my site.

#49

catch - July 17, 2008 - 09:16

I'm sure there's plenty of tests with empty post arrays already, so it shouldn't be that. I get weird issues with drupalPost sometimes, but they're hard to pin down, and I don't seem to get this specific one.

#50

mustafau - July 28, 2008 - 13:37

Updated the patch.

Simplified some of the code in aggregator.admin.inc.
Refactored tests. There are two test cases and both are successful now.

AttachmentSizeStatusTest resultOperations
aggregator-opml-import-16282-50.patch15.24 KBIgnoredNoneNone

#51

mustafau - July 30, 2008 - 10:57

Small improvements over the last patch.

Test cases merged.
XML parser now calls xml_parser_free().

AttachmentSizeStatusTest resultOperations
aggregator-opml-import-16282-51.patch15.12 KBIgnoredNoneNone

#52

Dries - August 2, 2008 - 20:46
Status:needs review» needs work

With the latest patch, the tests no longer come to halt. However, there is a notice: "Undefined variable: cat_title".

Note that we don't abbreviate words in Drupal code -- 'cat' should probably be 'category' unless you mean a 'cat'? ;-) Same with '$res'.

Some messages are too cryptic (i.e. ''Category field if categories exist.'). It would be great if we could make these a bit more human-friendly.

Is 'NATURAL JOIN' ANSI SQL? Will it work on PostgreSQL?

url='%s' should be url = '%s' (spaces).

We don't capitalize each word in a sentence (i.e. ""Test Category" should be "Test category").

#53

mustafau - August 2, 2008 - 22:56
Status:needs work» needs review

Updated.

"Category field if categories exist." is now "Looking for category field."
I am using PostgreSQL and NATURAL JOIN did work. Anyway I have changed it to "LEFT JOIN".

AttachmentSizeStatusTest resultOperations
aggregator-opml-import-16282-53.patch15.19 KBIgnoredNoneNone

#54

Dries - August 3, 2008 - 05:47
Status:needs review» fixed

Looks good and the tests pass. Committed to CVS HEAD. Thanks.

#55

mustafau - August 10, 2008 - 21:50
Status:fixed» needs review

I have added a redirect destination to the OPML import form and made small improvements to the code.

AttachmentSizeStatusTest resultOperations
opml_redirect.patch1.56 KBIgnoredNoneNone

#56

mustafau - August 10, 2008 - 22:30

We should validate URLs too.

AttachmentSizeStatusTest resultOperations
opml_redirect.patch1.83 KBIgnoredNoneNone

#57

Dries - August 11, 2008 - 07:05
Status:needs review» fixed

Committed to CVS HEAD.

#58

Anonymous (not verified) - August 25, 2008 - 07:14
Status:fixed» closed

Automatically closed -- issue fixed for two weeks with no activity.

 
 

Drupal is a registered trademark of Dries Buytaert.