Move feed parser to includes/feed.inc

mfredrickson - July 9, 2006 - 20:27
Project:Drupal
Version:7.x-dev
Component:base system
Category:task
Priority:normal
Assigned:Unassigned
Status:patch (code needs work)
Description

As per

http://lists.drupal.org/pipermail/development/2006-July/017414.html

I've moved the feed parsing code to it's own .inc file.

At this point is pretty much a straight cut and paste job.

In classic drupal style, the new API function drupal_retrieve_feed returns a nested array of channel information and items.

I've reworked aggregator.module to make use of this functionality.

This patch probably sits somewhere between "needs work" and "needs review." But it is a first attempt to get the ball rolling.

There are some todo's in the code and this doesn't have any of the feed generation stuff I talked about in the mailing list thread, but one thing at a time.

Ciao,
-Mark

AttachmentSize
aggregator.inc.patch.025.18 KB

#1

budda - July 10, 2006 - 10:48

I've done this in the feedparser module (currently work in progress in CVS).

http://cvs.drupal.org/viewcvs/drupal/contributions/modules/feedparser/

I've not added a refactored aggregator.module just yet, but it's on the list of things to do.

Eventually it would be good to move the parsing inc file into Core for use by everything. The plan is to expand the .inc file to handle ATOM feeds in addition to RSS.

#2

Dries - July 10, 2006 - 19:11

I'm going to put this patch on hold for 1-2 days. I'm going to create a subdirectory for each of the modules first. Then the aggregator.module and aggregator.inc can live in the same directory.

#3

Eaton - July 10, 2006 - 19:33

Dries: do you think that it's a good idea to keep the aggregator.inc and aggregator.module tightly coupled? It seems that one of the key advantages of separating them is allowing other modules to completely replace aggregator.module without reimplimenting the parser. Moving this file into /includes means that it can be considered a 'given' for other modules to build on without requiring that the existing aggregation UI, etc be present.

#4

walkah - July 10, 2006 - 19:34

Dries - I think feed parsing shouldn't be dependent on aggregator... general feed (RSS & Atom) parsing (as well as feed writing!) have applications in contrib outside of aggregator...

my $0.02CAD - but excited to see this moving forward.

#5

Dries - July 10, 2006 - 20:42

Sure, moving it to includes makes sense too. :)

#6

mfredrickson - July 11, 2006 - 15:56

@budda: I did look at your module before starting this refactor. I decided to concentrate on the existing aggregator code as that has the benefit of core level scrutiny. Of course, it could use improvement! In the interests of combining effort, would you be interested in working on this .inc file when it hits? Let's get it into core first, and then we can start working on improvements. (Hint: I'm hoping for a patch review. :-)

#7

budda - July 11, 2006 - 16:06

I've not actually touched much/if any of the parsing code i moved out of the aggregator.module.

I got the feeling such activity was a bit too radical for Core so soon so went about putting it into a contrib module first. I'm also going to be working with Drewish on this - he has nominated himself for ATOM support upgrades to the parser.inc file.

#8

Eaton - July 11, 2006 - 16:08

I agree with that approach: if we split out the inc file, efforts to improve the pure parsing side of things can focus on the core include file The need to include separate parsing libraries in other projects is exactly why this patch helps.

#9

drumm - July 14, 2006 - 01:27
Status:patch (code needs review)» patch (code needs work)

Looks okay. I'm fine with feed.inc being put in the includes directory. But we should only include it as necessary (when feeds are parsed). And there are a couple lines comment out for posterity's sake. We don't care about posterity.

#10

mfredrickson - July 14, 2006 - 20:25

what's the best way to include it then? generally, we don't have modules include /includes/*.inc files.

Should I put the parsing logic in feed.inc and the API call in common.inc?

Also, I was anticipating adding feed generation functions eventually. But one patch at a time...

#11

LAsan - April 7, 2008 - 11:17
Version:x.y.z» 7.x-dev

How's the work around this patch code?

Moving to cvs.

#12

mustafau - July 25, 2008 - 20:34
Status:patch (code needs work)» patch (code needs review)

I have updated the patch for CVS HEAD.

Changes are:

- Moved drupal_retrieve_feed() into common.inc.
- Implemented tests for feed.inc.
- Change _feed_parse_feed() to feed_parse().
- Rename aggregator_parse_feed() to aggregator_save_items().

Two of the tests fail because parser can not parse "<link href="http://example.org/"/>".

AttachmentSize
feed.inc_.patch29.81 KB

#13

moshe weitzman - July 25, 2008 - 22:54

There is a new aggregator module in patch review. The current one is not long for this world. Please review the other if possible. Thanks.

#14

mustafau - July 25, 2008 - 23:10

I am aware of the new aggregator patch. I have done my part of the review for it.

This is another approach to improving the current aggregator. If somehow new aggregator can not make it's way into the core this will be here as a resort.

#15

mustafau - August 1, 2008 - 17:38

Updated the patch.

Improvements over #12:
Better error handling.
Support for alternate feed parsing libraries.
Test cases merged into one.

AttachmentSize
feed_inc-72915-15.patch29.28 KB

#16

Dries - August 3, 2008 - 15:15
Status:patch (code needs review)» fixed

This one was committed to CVS HEAD earlier today. Thanks all.

#17

mustafau - August 3, 2008 - 15:35
Status:fixed» patch (code needs review)

@Dries: This patch is not committed. The one that was committed is: #16282: OPML Import for Aggregator

#18

Dries - August 3, 2008 - 16:03

Oops, this wasn't committed yet. Updated the wrong issue.

#19

mustafau - August 13, 2008 - 01:17
Title:Move feed parsing to own .inc file, refactor aggregator.module» Move feed parser to includes/feed.inc
Category:task» feature request

Re-roll.

AttachmentSize
feed_inc-72915-19.patch28.79 KB

#20

mustafau - August 13, 2008 - 21:33

Latest patch is broken. Re-rolling.

AttachmentSize
feed_inc-72915-20.patch28.82 KB

#21

mustafau - August 30, 2008 - 11:55

Added following functions to feed.inc

function feed_set_data($data = NULL) {}
function feed_get_data() {}
function feed_get_hash() {}
function feed_set_headers($headers = array()) {}
function feed_get_headers() {}

AttachmentSize
feed_inc-72915-21.patch29.65 KB

#22

Robin Monks - August 30, 2008 - 18:30
Status:patch (code needs review)» patch (code needs work)

When trying to update a feed in aggregator after adding it:

* There is no new syndicated content from gmplanet.
* The feed from gmplanet seems to be broken, because of error " Invalid document end".

* notice: Undefined property: stdClass::$code in /s**s/development.**.ca/htdocs/d1/modules/aggregator/aggregator.module on line 503.
* notice: Undefined property: stdClass::$code in /s**s/development.**.ca/htdocs/d1/modules/aggregator/aggregator.module on line 504.

Needs work.

Robin

#23

mustafau - August 31, 2008 - 01:25

Updated.

@Robin Monks: Can you try this one with another feed?

AttachmentSize
feed_inc-72915-23.patch29.66 KB

#24

dmitrig01 - August 31, 2008 - 03:37

Why aren't you using simpleXML?

#25

Robin Monks - August 31, 2008 - 12:26
Status:patch (code needs work)» patch (code needs review)

True, since we're requiring PHP 5.2 or higher this version, SimpleXML is available for this. I agree that would be a good step, but, I don't see why it couldn't be done after this patch hits HEAD. I think adding any more complexity to this patch will make it miss D7.

I'll be able to test later this afternoon, thanks mustafau!

Robin

#26

alex_b - August 31, 2008 - 13:00

Aron just posted a patch that replaces the existing parser with SimpleXML: http://drupal.org/node/302245 . It also moves parsing to an include file, but does not move it to includes/ - yet.

#27

Robin Monks - August 31, 2008 - 23:09
Status:patch (code needs review)» patch (reviewed & tested by the community)

#26: That patch is back to code needs work right now, they probably need a hand over there ;-)

Anyways:

This patch has been tested and WORKS! http://robinmonks.com/feed worked perfectly. Great work mustafau! This patch is now RTBC. I'd personally like to see this go in first, then get ported to SimpleXML, but, the powers that be will decide. This patch also survived the "aggregator" simpletest, 207 passes, 0 fails, 0 exceptions.

Robin

#28

Robin Monks - August 31, 2008 - 23:09

Also note, this is a task, not a feature request AFAIK.

#29

Robin Monks - August 31, 2008 - 23:10
Category:feature request» task

#30

mustafau - August 31, 2008 - 23:45
Component:aggregator.module» base system

With more documentation.

AttachmentSize
feed_inc-72915-30.patch30.39 KB

#31

Robin Monks - August 31, 2008 - 23:58
Status:patch (reviewed & tested by the community)» patch (code needs work)

Sorry, just found a new bug, if I update the same feed twice in a row I get this:

* notice: Undefined property: stdClass::$items in /srv/w*a/htdocs/d1/modules/aggregator/aggregator.module on line 472.
* notice: Undefined property: stdClass::$code in /srv/w*a/htdocs/d1/modules/aggregator/aggregator.module on line 504.
* notice: Undefined property: stdClass::$code in /srv/w*a/htdocs/d1/modules/aggregator/aggregator.module on line 505.

* There is no new syndicated content from Robin Monks' Blog.
* The feed from Robin Monks' Blog seems to be broken, because of error " Invalid document end".

This is a fresh head install, no previous data to mess anything up.

Robin

#32

alex_b - September 1, 2008 - 08:38

#27 - Indeed it would be good to move first on this patch and then on SimpleXML - don't know how compatible the two approaches are - curious to see Aron's response here.

#33

Aron Novak - September 3, 2008 - 15:40

I unified the two patches (#30 and SimpleXML for agg). aggregator.module tests are ok for me. Only one parser feed fails (no DC ns support currently for channel elements). Please test it.

AttachmentSize
feed_inc_simplexml.patch32.07 KB
 
 

Drupal is a registered trademark of Dries Buytaert.