As per
http://lists.drupal.org/pipermail/development/2006-July/017414.html
I've moved the feed parsing code to it's own .inc file.
At this point is pretty much a straight cut and paste job.
In classic drupal style, the new API function drupal_retrieve_feed returns a nested array of channel information and items.
I've reworked aggregator.module to make use of this functionality.
This patch probably sits somewhere between "needs work" and "needs review." But it is a first attempt to get the ball rolling.
There are some todo's in the code and this doesn't have any of the feed generation stuff I talked about in the mailing list thread, but one thing at a time.
Ciao,
-Mark
Comments
Comment #1
buddaI've done this in the feedparser module (currently work in progress in CVS).
http://cvs.drupal.org/viewcvs/drupal/contributions/modules/feedparser/
I've not added a refactored aggregator.module just yet, but it's on the list of things to do.
Eventually it would be good to move the parsing inc file into Core for use by everything. The plan is to expand the .inc file to handle ATOM feeds in addition to RSS.
Comment #2
dries commentedI'm going to put this patch on hold for 1-2 days. I'm going to create a subdirectory for each of the modules first. Then the aggregator.module and aggregator.inc can live in the same directory.
Comment #3
eaton commentedDries: do you think that it's a good idea to keep the aggregator.inc and aggregator.module tightly coupled? It seems that one of the key advantages of separating them is allowing other modules to completely replace aggregator.module without reimplimenting the parser. Moving this file into /includes means that it can be considered a 'given' for other modules to build on without requiring that the existing aggregation UI, etc be present.
Comment #4
walkah commentedDries - I think feed parsing shouldn't be dependent on aggregator... general feed (RSS & Atom) parsing (as well as feed writing!) have applications in contrib outside of aggregator...
my $0.02CAD - but excited to see this moving forward.
Comment #5
dries commentedSure, moving it to includes makes sense too. :)
Comment #6
mfredrickson commented@budda: I did look at your module before starting this refactor. I decided to concentrate on the existing aggregator code as that has the benefit of core level scrutiny. Of course, it could use improvement! In the interests of combining effort, would you be interested in working on this .inc file when it hits? Let's get it into core first, and then we can start working on improvements. (Hint: I'm hoping for a patch review. :-)
Comment #7
buddaI've not actually touched much/if any of the parsing code i moved out of the aggregator.module.
I got the feeling such activity was a bit too radical for Core so soon so went about putting it into a contrib module first. I'm also going to be working with Drewish on this - he has nominated himself for ATOM support upgrades to the parser.inc file.
Comment #8
eaton commentedI agree with that approach: if we split out the inc file, efforts to improve the pure parsing side of things can focus on the core include file The need to include separate parsing libraries in other projects is exactly why this patch helps.
Comment #9
drummLooks okay. I'm fine with feed.inc being put in the includes directory. But we should only include it as necessary (when feeds are parsed). And there are a couple lines comment out for posterity's sake. We don't care about posterity.
Comment #10
mfredrickson commentedwhat's the best way to include it then? generally, we don't have modules include /includes/*.inc files.
Should I put the parsing logic in feed.inc and the API call in common.inc?
Also, I was anticipating adding feed generation functions eventually. But one patch at a time...
Comment #11
LAsan commentedHow's the work around this patch code?
Moving to cvs.
Comment #12
mustafau commentedI have updated the patch for CVS HEAD.
Changes are:
- Moved drupal_retrieve_feed() into common.inc.
- Implemented tests for feed.inc.
- Change _feed_parse_feed() to feed_parse().
- Rename aggregator_parse_feed() to aggregator_save_items().
Two of the tests fail because parser can not parse "
<link href="http://example.org/"/>".Comment #13
moshe weitzman commentedThere is a new aggregator module in patch review. The current one is not long for this world. Please review the other if possible. Thanks.
Comment #14
mustafau commentedI am aware of the new aggregator patch. I have done my part of the review for it.
This is another approach to improving the current aggregator. If somehow new aggregator can not make it's way into the core this will be here as a resort.
Comment #15
mustafau commentedUpdated the patch.
Improvements over #12:
Better error handling.
Support for alternate feed parsing libraries.
Test cases merged into one.
Comment #16
dries commentedThis one was committed to CVS HEAD earlier today. Thanks all.
Comment #17
mustafau commented@Dries: This patch is not committed. The one that was committed is: #16282: OPML Import for Aggregator
Comment #18
dries commentedOops, this wasn't committed yet. Updated the wrong issue.
Comment #19
mustafau commentedRe-roll.
Comment #20
mustafau commentedLatest patch is broken. Re-rolling.
Comment #21
mustafau commentedAdded following functions to feed.inc
Comment #22
robin monks commentedWhen trying to update a feed in aggregator after adding it:
Needs work.
Robin
Comment #23
mustafau commentedUpdated.
@Robin Monks: Can you try this one with another feed?
Comment #24
dmitrig01 commentedWhy aren't you using simpleXML?
Comment #25
robin monks commentedTrue, since we're requiring PHP 5.2 or higher this version, SimpleXML is available for this. I agree that would be a good step, but, I don't see why it couldn't be done after this patch hits HEAD. I think adding any more complexity to this patch will make it miss D7.
I'll be able to test later this afternoon, thanks mustafau!
Robin
Comment #26
alex_b commentedAron just posted a patch that replaces the existing parser with SimpleXML: http://drupal.org/node/302245 . It also moves parsing to an include file, but does not move it to includes/ - yet.
Comment #27
robin monks commented#26: That patch is back to code needs work right now, they probably need a hand over there ;-)
Anyways:
This patch has been tested and WORKS! http://robinmonks.com/feed worked perfectly. Great work mustafau! This patch is now RTBC. I'd personally like to see this go in first, then get ported to SimpleXML, but, the powers that be will decide. This patch also survived the "aggregator" simpletest, 207 passes, 0 fails, 0 exceptions.
Robin
Comment #28
robin monks commentedAlso note, this is a task, not a feature request AFAIK.
Comment #29
robin monks commentedComment #30
mustafau commentedWith more documentation.
Comment #31
robin monks commentedSorry, just found a new bug, if I update the same feed twice in a row I get this:
This is a fresh head install, no previous data to mess anything up.
Robin
Comment #32
alex_b commented#27 - Indeed it would be good to move first on this patch and then on SimpleXML - don't know how compatible the two approaches are - curious to see Aron's response here.
Comment #33
aron novakI unified the two patches (#30 and SimpleXML for agg). aggregator.module tests are ok for me. Only one parser feed fails (no DC ns support currently for channel elements). Please test it.
Comment #34
SeanBannister commentedsub
Comment #35
mustafau commentedI needed to kill some time. This is the result.
Very simplistic. Just move some functions from aggregator.parser.inc to include/feed.inc. Now we have a general feed parser and a way to override default parser.
Comment #37
lilou commentedHEAD is broken.
Comment #38
mustafau commentedComment #39
klausi#35: feed_inc-72915-35.patch queued for re-testing.
Comment #41
fabianx commentedIs this still an issue?
Comment #45
klausiThis is now all in classes and probably no longer relevant.