Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
I created a proposal at http://groups.drupal.org/node/9857 . Please share your ideas or any thoughts there.
Comment | File | Size | Author |
---|---|---|---|
#38 | aggregator.patch | 158.82 KB | Aron Novak |
#33 | aggregator_new_parser.patch | 19.52 KB | Aron Novak |
#33 | aggregator_pluggable_and_new_parser.patch | 40.07 KB | Aron Novak |
#32 | aggregator_pluggable.patch | 27.99 KB | Aron Novak |
#23 | aggregator.patch | 135.28 KB | Aron Novak |
Comments
Comment #1
alex_b CreditAttribution: alex_b commentedsubscribe
Comment #2
Aron NovakFYI: This is under construction, by 18th of July, there will be a drop-in replacement for the core aggregator.
You can follow the development here: http://groups.drupal.org/node/11378
Comment #3
Aron NovakHere is the first version of the patch which is against the existing aggregator, to be applied to HEAD. Ideally, patch before you install the site.
If you'd like to apply to an existing site, be aware that existing aggregator-related data will be lost; follow this procedure: enable syndication_parser module then using the devel module reinstall the aggregator module.
This will break aggregator 's tests, because of the little bit different internals of the two modules.
What is missing from this patch:
A node processor is also available from here: http://www.drupal.org/project/new_aggregator (in fact the patch is from this project), you can copy the aggregator_node module if you'd like to see this stuff with nodes.
Comment #4
Morbus IffSubscribing.
Comment #5
Morbus IffHrm. I noticed that you stopped GUID support (for "newness" checks) entirely - this is wrong.
See http://drupal.org/node/51644.
Besides the weather example over there, there's another use case for producing feeds that set the [link] not to the permalink of the RSS item, but to the *link being discussed* in the RSS description (or title). Off the top of my head, delicious feeds do this, though I've seen others. If Person A talks about Link A, and then Person B talks about Link A (and both set the [link] to Link A), I'll never see Person B's discourse (as a new item).
Comment #6
mfer CreditAttribution: mfer commentedSubscribing.
Comment #7
Aron NovakI'm aware of that this is a problematic area.
Fortunately replacing the deduping logic is not a hard thing at all. I'll follow the discussion at http://drupal.org/node/51644 about the topic.
If there will be consensus about this, I'm ready to adopt the new way of deduping.
Comment #8
Morbus IffIncidentally, you've currently got that particular function as "private" - but I don't think it should be. Knowing that an item is "new" seems like a very useful thing in a programmatic environment - it should be accessible to modules (ie., don't make it a "I trust you won't use this" private function).
Comment #9
cburschkaWhat does this rewrite imply for #16282: OPML Import for Aggregator?
Does the new aggregator support OPML import already, or should the OPML patch be updated to fit into the rewritten aggregator?
Comment #10
alex_b CreditAttribution: alex_b commentedI'd say #16282 will need to be ported once this patch is tight. The parts of the patch that I see effected are where feeds are being created
and the tests - haven't looked at them though.
Comment #11
Aron NovakI updated the patch. This reflects all the changes at new_aggregator alpha2 and alpha3 releases.
The most important changes:
Added feed image support.
Full test suite for Aggregator.
Tests for syndication parser, testing the feed parsing capabilities.
Minor UI improvements.
Separated all Aggregator Light processor related code to aggregator.ligt.inc.
Handling node types where the name contains underscore.
unified Aggregator and Aggregator Light modules
all the features of the existing Aggregator is available (categorizing is replaced by taxonomy)
A content-type is automatically created when you enable Aggregator module.
There is a node processor.
The parser supports namespaces and enclosures.
Meaningful error and informational messages.
More per-content-type setting options (because of killed hardcoded values)
Comment #12
Morbus IffAron - one of the things that has always plagued aggregator was the use of an aggregator-specific input filter. I'm pretty sure there's an issue about this somewhere. Was that replicated in this module, or does it create a default input filter for use with aggregated items? For example, it's currently annoyingly obtuse to disallow images in comments, but to allow images in feed items (you have to modify settings inside settings.php - there's no UI). What's the new approach's take on that?
Comment #13
catchHere's the aggregator/filter module issue: #14104: Aggregator should support filter.module
Comment #14
Aron NovakBy default the aggregator uses the site-wide default input filter. In the top of that you can alter the input filter per-content-type.
Comment #15
mustafau CreditAttribution: mustafau commentedThis patch is huge in size and is implementing multiple new functionality. In order to get real attention you need to separate this patch into multiple issues. For example; the concepts: "Feeds as nodes", "Feed processors" and "Feed parsers" are worth three separate issues.
I believe there is a consensus on the concept of "Feeds as nodes". Drupal core is also moving towards this direction (e.g. polls in core are nodes). If you submit a separate "Feeds as nodes" patch it will get committed in no time.
The patch at #16282: OPML Import for Aggregator is going to be committed. In order to re-implement OPML import for this new aggregator you need to look at the patch at #273050: Use Batch API for OPML import
Comment #16
Morbus Iff@mustafau: unfortunately, you're woefully misinformed. There is no consensus to get feeds as nodes in core, and it won't be committed in "no time". Polls have always been nodes in core, all the way back to at least 4.6. In fact, the exact opposite of what you state has been leaned toward: feeds as non-nodes definitely, feeds as nodes optional. Aron's patch offers the best of both worlds (ie., both at the same time, or one or the other). It's about the only way that feeds as nodes would ever get into core.
Comment #17
mustafau CreditAttribution: mustafau commented@Morbus Iff: When you refer to feeds as nodes you actually mean feed items as nodes right? Because Aron's patch only offers feeds as nodes. The optional part is feed items as nodes.
Comment #18
Morbus IffYes, you're correct. My bad.
Comment #19
alex_b CreditAttribution: alex_b commented#15 - it's hard to see how this patch can be split up. It's in part a complete rewrite of aggregator's core in order to enable functionalities like interchangeable parsers, optional feed item processors, etc. Let me know if you've got any concrete ideas how to break it apart.
I think what's missing atm is a good itemized list of what this patch changes - not only a link to the outline on g. d. o.
Thanks for heads up on #273050
#17 - correct, feeds are nodes, feed items are pretty much the same as in existing aggregator
Comment #20
alex_b CreditAttribution: alex_b commentedI did a review of the latest patch. Some UI issues:
* Per content type settings need a clean up. I'd suggest to convert the check box into an enable/disable button, add field sets for better structure and only show advanced settings for activated parsers and processors. I rolled a patch against the project here: http://drupal.org/node/290788
* aggregator/terms help text: this text doesn't work, it only makes sense for site administrators.
* For the sake of more continuity: can we keep as many files as possible on their place in the directory? E. g. aggregator.module in aggregator/ etc.
* Help text on admin/build/node-type/*/aggregator would be great.
* I'm still not sold on using "Terms" instead of "Categories" on /aggregator
... more to come.
Comment #21
alex_b CreditAttribution: alex_b commentedThis is an attempt to give this patch a more specific name in response to #15.
Comment #22
mustafau CreditAttribution: mustafau commented#19: I know it is hard to separate the functionality of this patch into several patches however it is not easy to get accepted into Drupal core either. Here are some ideas on how to make it happen:
- Submit a minimalistic feeds as nodes patch (from #15). Since core aggregator treats every feed equally we only need a built-in "Feed" content type. I believe in the future we can skip per content type settings and just implement per feed settings.
- Separate current parser from aggregator and introduce SimpleXML parser as a replacement. First of all we need to resolve this issue: #72915: Move feed parser to includes/feed.inc. Once this one is fixed then we can submit a patch introducing SimpleXML parser as a replacement.
- This patch stores hash of the feed data. I submitted a patch for aggregator implementing this functionality (#291064: Store hash of feed data).
Comment #23
Aron NovakHere is a newer version of the patch. It's only a quick update to allow folks to apply the patch to the latest CVS code. The OPML import part of the aggregator is currently not included in the patch (it is planned of course)
Comment #24
mustafau CreditAttribution: mustafau commentedCan we get rid of CVS Id's like following please?
Comment #25
Morbus IffGet rid of them? They're required in .tpl.php files. They'll be rewritten to the right value when/if the patch gets committed.
Comment #26
mustafau CreditAttribution: mustafau commentedThose lines should not be visible in patches.
Comment #27
recidive CreditAttribution: recidive commentedThe functions aggregator_parse_w3cdtf() and _syndication_parser_parse_w3cdtf() apparently do the same thing. Can we get rid of one of them?
// $I:$
on syndication_parser.test should be// $Id$
and you don't need to manually change the version number on the others.There are evidences of some rejected lines on aggregator.admin.inc.
Also there are places that needs cleanup to be more inline with Drupal coding standards.
Comment #28
mustafau CreditAttribution: mustafau commentedLine #185 of the patch at #23 is broken.
Comment #29
Dries CreditAttribution: Dries commentedMmm, it is not the best idea to implement 20 changes in one patch. History has learned us that it is better to submit smaller patches. I'd really recommend that we break this up in smaller chunks and that we consider one chunk at the time.
Comment #30
alex_b CreditAttribution: alex_b commented#15, #29: I will start thinking about how we can break this patch into smaller chunks... Aron's offline for the next week, he'll be able to look at this again once he's back.
I just wrote about the patch here: http://www.developmentseed.org/blog/2008/aug/06/improved-aggregator-drup...
I hope it's helpful for understanding the proposed changes.
Comment #31
Jose Reyero CreditAttribution: Jose Reyero commentedThe overall idea looks very good to me. The implementation looks overly complex though.
I also agree this patch is way too big and attempts to get too many things at once. If we want, as it seems to be, to have a generic aggregator engine that supports multiple parsers and multiple storage forms, only that, as simple as possible would be great as a first patch (and complex enough).
I.e. the aggregator_cron() logic is way too complex, and I don't think that queries will really scale for a big number of feeds. As opossed to the old one which was a simple query -> retrieve feed -> refresh feed...
Also arrays of feeds are handled all the time through the api, which makes it look much more complex that it should be.
So, in summary I think: Focus, Simplify. Get us a simple patch that supports different parsers and node storage as a first step. Then we'll see about cron load balancing, feeds to be skpped, and more complex features later.
Btw, I've tried the patch with latest HEAD and it has some issues, couldn't really get it to work:
- Main aggregator page
user warning: Unknown column 'i.nid' in 'on clause' query: SELECT COUNT(*) FROM aggregator_item i INNER JOIN aggregator_feed f ON i.nid = f.nid INNER JOIN node n ON f.nid = n.nid in /home/workspace/drupal/modules/aggregator/aggregator.pages.inc on line 62.
- Content type settings
Parse error: syntax error, unexpected T_SL, expecting ')' in /home/workspace/drupal/modules/aggregator/aggregator.admin.inc
Comment #32
Aron NovakThis week I started to explode that huge patch.
Here is the first one which contains:
- time-based cron
- pluggable architecture (you can replace the processor and the parser if you'd like)
That's all. I even don't have to touch the aggregator tests because the following are unaltered: UI, data structure, db structure.
What's missing: more abstraction of items to make possible to write really different processors.
Comment #33
Aron NovakSecond round of exploded patches.
aggregator_new_parser.patch only replaces a new parser to the aggregator module without the pluggable architecture.
aggregator_pluggable_and_new_parser.patch replaces a new parser and also contains the pluggable architecture changes.
Comment #34
catchAron - it'd probably be easier to review each set of patches in a single issue and link from here. Also are any of these ready to review, or is it all still needs work?
Comment #35
Aron NovakThe real question is here: where to go?
The above patches (the exploded ones, not the huge patch) pass the automated tests and do not break important functionality. Some minor features are not perfect at the moment.
At the moment I upgrade the previous huge patch to apply without problem to the CVS HEAD and fix the problems what the reviewers mentioned above.
I try to explain below the possibilities.
At the beginning of the Summer of Code, it seemed that mostly we agree that feeds should be nodes (http://groups.drupal.org/node/11413#comment-37089 read from this comment)
If feeds = nodes, the following changes are neccessary / obvious:
- pluggable architecture (that's the point where the aggregator fails)
- complete rework of the admin interface - lot more easier
- huge changes in categorization (taxonomy is the sane choice for categorization for nodes)
- content-type based configuration possibility (you may have several types of feeds)
I assume we need pluggable architecture for aggregator, this is obvious for me. (but correct me if the situation is different :) )
Questions:
- Do we need the power of nodeapi and node CRUD for feeds too?
- Do we want to replace the parser with a SimpleXML based parser? Now it's a good time to do, because Drupal 7 requires PHP5.
- Do we want feed URL autodetection?
The most of the changes that I did at the huge patch can be made totally independently from each other. The only change that affects tons of things is feed = node.
Comment #36
mustafau CreditAttribution: mustafau commentedA built-in "Feed" content type patch for Aggregator: #293318: Convert Aggregator feeds into entities
Comment #37
Aron NovakIt's awesome that you made already an important patch! I really hope that this patch will be accepted.
Comment #38
Aron NovakI also re-rolled the big patch. Maybe it's also good to see the whole concept in one big patch.
Unfortunately I haven't added OPML import, this patch simply removes it. This is not good at all.
But let's focus on the above small patches and especially the questions above.
Comment #39
keith.smith CreditAttribution: keith.smith commentedI have some issues with numerous strings in this patch, with this one as an example:
Aron, I really applaud your efforts here, and want to help. ; 'tis just that a number of the strings need tweaking or follow existing Drupal guidelines, some of which, are admittedly unwritten.
I'm setting this to CNW because I intend to either reroll this or go through this string by string with suggestions for potential changes.
Comment #40
Aron Novakkeith.smith: I totally agree. There are some reasons that this patch needs work,
First, maybe there will be huge changes in the DB layer.
Second, opml import is missing.
Third, the strings, yes. I really appreciate if someone helps me to make these strings well-written.
Comment #41
alex_b CreditAttribution: alex_b commentedWe're starting to break out issues now, SimpleXML parser for aggregator is the first: http://drupal.org/node/302245.
There should follow a separate issue for extensible architecture. Once this separate issue is up, let's close this thread.
Comment #42
Aron NovakThe separate patches are out.
Comment #43
jpetso CreditAttribution: jpetso commentedThis is the set of (pending) patches originating from this one, as far as I could find those:
#72915: Move feed parser to includes/feed.inc
#303930: Pluggable architecture for aggregator.module
#302936: DB TNG: Convert aggregator.module queries
If I forgot any, feel free to add them here.