Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
Here is a patch for adding two hooks (one for the parser, one for the processor) to the aggregator to make it extensible.
The patch currently does not apply to the HEAD because of the changes. But now there are so many pending patches for the aggregator, so i'll wait with updating this patch. The basic idea can be spotted at this status too.
Comment | File | Size | Author |
---|---|---|---|
#41 | aggregator_pluggable_10.patch | 59.81 KB | Aron Novak |
#38 | aggregator_pluggable_9.patch | 59.46 KB | alex_b |
#37 | aggregator_pluggable_8.patch | 57.9 KB | alex_b |
#32 | aggregator_pluggable_7_1.patch | 51.71 KB | Aron Novak |
#31 | aggregator_pluggable_7.patch | 51.16 KB | alex_b |
Comments
Comment #1
alex_b CreditAttribution: alex_b commentedGreat. Can you describe better what this patch proposes to introduce?
Comment #2
Aron NovakPluggable architecture at this patch contains the following:
Parser / processor concept
Parser is for transforming downloaded raw data into a data structure.
Processor is for reacting to add / update a feed item.
This patch makes aggregator a parser and a processor too, this means that the built-in capabilities (old parser and feed items as non-nodes) are available immediately.
From developer viewpoint, in a parser, you need to implement aggregator_parse() hook and in a processor, you need to implement aggregator_process(). There are some operations of these. (saving/unique, etc)
The patch does not alter the internals of aggregator, only makes possible to write a drop-in replacement for parsing or saving the feed item. (for example as node or whatever you can imagine, send out an email, etc)
Also, for configuring these, aggregator has a form, where you can overview your parsers and processors and aggregator offers easy-to-use settings handling for parsers and processors.
Basically parser and processor are such a module, where the important hooks are implemented.
Parsing:
$channel = module_invoke($parser, 'aggregator_parse', 'parse', $feed);
- one step processProcessing:
Two step, two operation.
Comment #3
jpetso CreditAttribution: jpetso commentedShort, very incomplete look over the patch while attempting to confirm my understanding of how it generally works:
Shouldn't this be made unconditionally and only be introduced once that function exists? Or maybe there's some other reason to do it this way...
Other than that, very cool. I think a later patch might split parsers in two to separate the transport (currently in aggregator_aggregator_parse()) and the actual parser (currently in aggregator_parse_feed()) so that Aggregation's functionality of supporting feeds requiring HTTP or FTP authorization can be done without implementing separate parsers. (If there are feed parsers other than aggregator_parse_feed(), you'd need to write a hook_aggregator_parser() wrapper for each of those, which results in parser hooks = (transport methods * actual parsers) if you want the complete set of options.) But that's just an idea for a follow-up, and doesn't need to be handled in this patch.
Comment #4
Aron NovakI rerolled the patch against the latest changes.
jpetso: Thank you for the review, i applied most of your suggestions.
aggregator_is_enabled() - yeah, it's questionable if it's a good choice or not, i have not altered it yet.
aggregator_cron(), as i saw, drupal core does this in a similar way really often.
Tests: 100% successful
Not tested, maybe broken (that's why needs more work): cron part and image support
Comment #5
alex_b CreditAttribution: alex_b commentedRerolled after patch #323751 caused conflict.
Comment #6
alex_b CreditAttribution: alex_b commentedFixed:
- typo wrong: variable_get('aggregator_processor', array); correct: variable_get('aggregator_processors', array);
- aggregator_is_enabled() ambiguous if a module implements parser and processor. Replaced with aggregator_get_enabled_parser() and aggregator_get_enabled_processors().
Comment #7
catchWe've gone to quite a lot of effort to remove $op from core - hook_user and hook_nodeapi, others like block and taxonomy to follow. It'd be a shame to add a new hoook with an $op in it at this stage.
Comment #8
alex_b CreditAttribution: alex_b commentedMore fixes and adjustments:
- only call processors if $channel['items'] is an array
- explicitly configure aggregator parsers and processors in _install() (before admin/content/aggregator/settings did not look correct before first time saving the form)
- remove break; after return; in switch statements
- move 'unique' above 'save' in aggregator_aggregator_process() - corresponds with the order of call now
Changes don't break any tests.
Deduplication / 'unique' stage:
For those who're not familiar with the intentions behind a separate 'unique' stage in the aggregation process: The idea is to find out which items on the currently processed feed are unique before the items are being sent to the save stage. This separation of deduplication and saving allows other processors to modify the deduplication behavior of processors simply by implementing a hook_aggregator_process('unique') operation.
I had a better look at how unique handling works. I think it needs to be made more straightforward.
I see that on the 'unique' stage aggregator module flags whether an item exists already or not. If it exists, it doesn't just say 'FALSE': it stores the item's ID - which is smart, because it saves us going to the database again on the 'save' stage - but it looks a little funky as we're using a single slot as boolean or integer:
Further down the line it leads to a test whether the id of the current item is TRUE in order to find out whether we're looking at a unique item or not.
At this stage I would rather expect that iid is not set or 0 if not available.
I need to think better about this, but at first glance, it would seem to me more intuitive to call the flag not 'unique' but 'unique_id' which would a) make storing the internal aggregator id more appropriate and b) make unique_id = 0 the common sense flag for 'this item doesn't exist yet'.
Thoughts?
Comment #9
Aron NovakPatch is rerolled.
Changes:
- image support was broken
- cron part was broken
Comment #10
Aron NovakI propose the following for unique handling:
Make a hook_aggregator_process_id() (exploded function instead of one huge aggregator_process with $op -s)
Add $feed['items'][$k]['id']['aggregator'] to the array if the item is already exists
At save stage, isset($feed['items'][$k]['id']['aggregator']) will determine if it's a dupe or not.
#7: makes sense. It will be refactored.
Comment #11
alex_b CreditAttribution: alex_b commentedThis sounds good to me.
Comment #12
alex_b CreditAttribution: alex_b commentedRerolled after #279516: Remove workarounds for PHP versions less than 5.2.x and #233407: Aggregator module hard coded references to blog module disturbed the peace.
Comment #13
alex_b CreditAttribution: alex_b commentedI'm working on a revised version after #7 and #10. Expect results soon.
Comment #14
alex_b CreditAttribution: alex_b commentedRework of this patch.
#7: $op is gone.
#10: let's leave out deduplication overriding for now.
CHANGES
To summarize, these are the changes to Drupal Core in this patch:
* new hooks for parsers and processors of feeds:
** hook_aggregator_parse()
** hook_aggregator_parse_info()
** hook_aggregator_process()
** hook_aggregator_process_info()
** No hook for unique ID's at the moment.
* Refactor aggregator so that parsing and storage happen on parse and process hook callbacks.
* Break out sections that are only needed for aggregating
** Parsing functions are broken out into aggregator.parser.inc
** Processing functions are broken out into aggregator.processor.inc
** aggregator.module down to ~670 lines from ~1000
Consequential changes:
* $feed is object for passing through parsing and processing stages.
* aggregator_admin_settings() contains form for configuration of parsers and processors
Details:
* fixed a broken query db_merge('aggregator_category')->key(array('cid' => $id))
* fixed a notice in theme_aggregator_block_item()
Patch passes all tests. Reviewed by Aron Novak so far.
Documentation
* Expect documentation on new hooks soon.
Comment #15
alex_b CreditAttribution: alex_b commentedPluggable architecture patch:
* Fixed an obsolete hook call in aggregator_form_alter()
* Moved aggregator_form_alter() to aggregator.processor.inc -it only contains processor related functionality.
Documentation;
Patch to drupal docs developer/hooks/core.php to document
** hook_aggregator_parse()
** hook_aggregator_parse_info()
** hook_aggregator_process()
** hook_aggregator_process_info()
Comment #16
catchLooks like this is ready for review to me.
Comment #17
alex_b CreditAttribution: alex_b commented#16: Ready to review, yes.
Comment #18
Anonymous (not verified) CreditAttribution: Anonymous commentedThe last submitted patch failed testing.
Comment #19
Aron NovakRerolled.
Notes for manual testing, these are only important if you patch an already enabled/installed aggregator module:
- after install the module, empty the cache (via the devel module), necessary because of new files
- go to admin/content/aggregator/settings and save the defaults
The documentation patch will be supplied separately in the proper queue, it will be linked from here also.
Comment #20
Aron NovakRelevant documentation patch: #334752
Comment #21
chx CreditAttribution: chx commentedThis is a great patch. I like the API , it's a dream coming true! A few nitpicks:
variable_get('aggregator_parser', '')
. Let's instead use the sane default everywhere and do not set the default in install.variable_get('aggregator_parser',
but see the default above.aggregator_get_enabled_processors
, what this code does, isarray_filter
but why would a getter do that, why you dont array_filter on set? Then this function can be removed as the previous one to be replaced by a simple variable_get.Comment #22
chx CreditAttribution: chx commentedFollowing up on a chat with Aron, one more which would solve 2. above -- let's separate out the fetcher as well, there are four (stream wrapper, drupal_http_request, curl, pecl http extension) fetchers at least.
Comment #23
alex_b CreditAttribution: alex_b commented#22 - this would mean an additional hook hook_aggregator_fetch() ? Sounds good. I have been against fetch hooks in the past. Recent experience taught me otherwise: we wrote a CVS, an RDF and a comment parser for FeedAPI and in each occasion we had to reuse fetch functions from other parsers.
#21 - I agree with your points. Some clarifications:
5) These paths need to be removed. There is no admin/content/aggregator/aggregator/parser and no admin/content/aggregator/aggregator/processor page.
8) The format on what's on 'item' is entirely up to the parsers and processors in place. Neither does the patch change anything on that level.
* hook_aggregator_parse() and hook_aggregator_process() need to mention that the format of $feed->items is up to the implementers.
* aggregator_aggregator_parse() needs to be documented on the format of the $form->items it produces and
* aggregator_aggregator_process() documentation should contain a reference to aggregator_aggregator_parse().
9) Optional, needs to be better clarified.
Comment #24
Aron Novak#21,
1)-7): thanks for the notices and suggestions, all of them are fixed somehow.
There is a new file: aggregator.fetcher.inc.
8) is replied by alex_b at #23
9) The documentation patch should be updated also and make this question clear.
Comment #25
alex_b CreditAttribution: alex_b commented#21: 8 and 9 are addressed on #3 of #334752: Documenting new hooks with aggregator pluggable architecture patch
I will review the latest patch on #24 on the weekend.
Comment #26
catchCouple of minor things:
Why would $edit ever be an object? Can we fix that further up the chain?
aggregator_form_aggregator_admin_form_alter
- aggregator module doesn't need to form_alter it's own forms - or if we do for some reason, then there ought to be a comment to say why.This is just nitpicky stuff, will try to do a more substantive review over the next week or so.
Comment #27
chx CreditAttribution: chx commentedthat's some pluggable architecture, then :) well, this means that parser and processor must come from the same place? Or how are you going to provide interoperability...?
Comment #28
alex_b CreditAttribution: alex_b commented#28 - (per IRC conversation) items format needs to be documented.
Comment #29
alex_b CreditAttribution: alex_b commentedFAPI arrays that are affected by this patch should be reformatted so that each property sits in its own line.
Comment #30
alex_b CreditAttribution: alex_b commented#29: fixed
#26:
* I still need to take a better look into this $edit variable.
* The form alter hook is documented - aggregator form alters its own form for keeping the processor part separate from the aggregator API part and for creating a reference for implementing a processor.
Further changes:
* Simplify admin/content/aggregator/settings - hide fetcher/parser/processor configuration if there is only one of each available and therefore nothing to configure.
* Move aggregator_form_aggregator_admin_form_alter() to aggregator.processor.inc - it only implements processing functionality.
* Clean up language in fetcher.
Comment #31
alex_b CreditAttribution: alex_b commented#26: array $edit argument in aggregator_form_feed() is now a stdClass $feed.
Explanation:
Aggregator uses $feed inconsistently. A grep for '$feed[' in the original aggregator files finds 44 occurences, for '$feed->' finds 84.
For the API we decided to go with $feed as an object rather than an associative array. This entails some changes throughout the module. In the case that catch pointed out in #26, the $edit array changes to an object because the menu loader loads feeds as an object rather than an array. After this patch we should roll one that makes the usage of $feed consistent as object in the entire module.
This patch fails in OPML test in "Categories are correct." I post now cause the turkey's waiting and #30 is broken :)
Comment #32
Aron NovakI fixed the bug of the previous patch. Tests are fine now.
Comment #33
alex_b CreditAttribution: alex_b commentedReady for review.
Comment #34
moshe weitzman CreditAttribution: moshe weitzman commentedI took a look at the UI and I am quite happy with it. Code looks sane too. I think we are ready for final discussions. RTBC.
Comment #35
alex_b CreditAttribution: alex_b commentedI did more testing.
I wrote a feed-items-as-node processor (available here http://cvs.drupal.org/viewvc.py/drupal/contributions/sandbox/alex_b/aggr... ) and found two problems:
a) There are sometimes empty items in the $feed->items array on the processing stage. These items are caught in core aggregator in the aggregator_save_item() routine. Maybe we should filter them out before they actually go into the processing stage. I need to take a look at that.
b) Install a new processor (aggregator_node), activate it, aggregate some items, then go to admin/content/aggregator and click on 'remove items' - nothing happens. This makes the need for a remove hook obvious (or we form alter the remove link into the feed form :( ) - hook_aggregator_remove()
Adding this hook, we will stand at 7 new hooks for aggregator:
Comment #36
catchExtra hook won't do any harm. Seems like the cases you mentioned would be a good candidate for a small module in simpletest implementing the new hooks.
edit: also marked #334752: Documenting new hooks with aggregator pluggable architecture patch as duplicate since hooks can be documented directly in core now with module.api.php
Comment #37
alex_b CreditAttribution: alex_b commentedPatch as in #32 containing api.php
Comment #38
alex_b CreditAttribution: alex_b commented* Added hook_aggregator_remove()
* Fixed 3 occurences of array $feed in comments where $feed was an object
There is a detail that might be worth discussing:
Expiry of feed items is not dealt on an API level, it's entirely in the processor's responsibility to implement a mechanism that deletes items after a while. I prefer this because it spares us an additional hook that does something very similar that hook_cron() does. However, the one drawback is that implementing modules can't expire feed items when clicking on 'update items' on admin/content/aggregator . They can only expire on hook_cron().
I would keep it this way, but I'd like to see other people's opinions.
Comment #39
Aron NovakI tested the functionality and it works well.
Also I like the idea of let the processor to decide how to expire and so on.
Comment #41
Aron NovakRerolled the patch again. Small fix added: sometimes drupal_http_request gives 200/202/307 code, but data and headers are not set, a php notice appeared.
Comment #42
Dries CreditAttribution: Dries commentedCommitted to CVS HEAD. Thanks!
Comment #43
alex_b CreditAttribution: alex_b commentedYeay. Thanks everybody. This is really motivating for outstanding work on Aggregator. I posted a list of it here: http://groups.drupal.org/improvements-core
Comment #45
akahn CreditAttribution: akahn commentedFYI, this patch is causing #464792: Notice/warnings on submitting aggregator settings form.