Consider how to handle comments [#947266]

Based on #921774: Support CCK creation using Feeds and Mailhandler MailhandlerNodeProcessor will just handle creating nodes. Somewhere in the stack it needs to be determined what's a node and what's a comment (and maybe, what's an X, since when processing a mailbox, maybe you want some messages turned into nodes, some turned into comments, and some turned into simply stored items). In any case, we need to be able to:

* Allow messages in a single mailbox to become distinct object types in Drupal (node, comment, other)
* A way, probably in MailhandlerParser, to determine what's what.
* Or, move away completely from the idea in point #1 and require separate mailboxes for separate object types.

dagmar mentions in http://drupal.org/node/921774#comment-3495476 about a comment processor in the works which should be considered.

Comment	File	Size	Author
#14	mailhandler-947266-14.patch	5.7 KB	danepowell
#11	mailhandler-947266-11.patch	6.77 KB	danepowell

Comments

Comment #1

danepowell commented 5 January 2011 at 15:55

Subscribing

Comment #2

Ian Ward commented 7 January 2011 at 23:32

I'm considering the following approach:

* Modify mailhandler_retrieve_message so it accepts a callback, where the callback is called with the $header as an argument and which should determine whether or not to "fetch" the message, or leave it in the mailbox, unread.
* Use ctools to let such "message filter" callbacks to be registered
* Add a setting to the default mailhandler parser which lets a "message filter" be selected on a feeds importer basis.
* Use separate processors, for nodes, for comments, for X.

Historically, there was a problem with actually "fetching" the messages using the fetcher, having to do with the use of the batch API. I will re-evaluate the possibility to make the fetcher really fetch, in which case this "message filter" setting would be selected as a setting on the fetcher, which makes more general sense.

What this means to the end-user. They'll create a mailbox, then create an importer. When they configure their importer they'll need to, besides map sources to targets, select the "message filter" to use. The defaults available will be:

* All messages, no filtering
* Messages with in-reply-to (considered comments, and would use the comment processor)
* Messages without in-reply-to (considered nodes, for example, and would use the node processor)

Based on this, if you want to process comments and nodes, you'll setup two importers which read from a single mailbox, but which are configured to use different "message filters" and optionally different processor mappings.

Sound sane? The ability to ship mailhandler with default/example importers already configured will ease a lot of the configuration process.

Comment #3

Ian Ward commented 7 January 2011 at 23:36

"and optionally different processor mappings." <--- should read "and would use different processors, and optionally different processor mappings"

Comment #4

Ian Ward commented 7 January 2011 at 23:41

To clarify ambiguity, "callback" would likely be this:

* MailhandlerParser calls mailhandler_retrieve. mailhandler_retrieve will take an additional argument to its present signature, which will likely be a function literal to be passed to mailhandler_retrieve_message and invoked with call_user_func_array. The key is, the parser (or the fetcher, if I can make the fetcher really fetch) would pass in which callback to use, which lets the message filter strategy differ on a per-importer basis, but still while using a single mailbox.

Comment #5

Jean-Philippe Fleury commented 8 January 2011 at 03:38

subscribing

Comment #6

Ian Ward commented 12 January 2011 at 05:12

The fetcher now fetches and the parser parses, as of http://drupal.org/cvs?commit=479678 and seems to be working fine, including when there are file attachments, with a small fix. This change is mentioned above. A patch can now be made along the lines above.

Comment #7

danepowell commented 12 January 2011 at 21:16

That sounds good- I assume Feeds Comment Processor would then be used to post the comments.

I'll go ahead and try to get a patch in order, since it sounds like this isn't your top priority- please let me know if you're already actively working on this.

Comment #8

danepowell commented 12 January 2011 at 21:47

Hmm- the only thing I don't understand is how this works with the way Mailhandler mailboxes are configured at admin/build/mailhandler. Specifically, ~~(1) I don't see how a new Feed is linked to any given Mailhandler mailbox, and (2)~~ I don't understand how this is going to work with the "Basic mailbox" vs "Node mailbox" option. If I understand you correctly, users will select the "Node mailbox" even if the mailbox accepts comments as well, and then will configure on a per-importer basis how to handle node/comment differentiation by selecting a filter at admin/build/feeds/edit/[feedname]/settings/MailhandlerFetcher. This might be a little confusing to users, but you're right that giving some default importers will help.

Comment #9

Ian Ward commented 12 January 2011 at 22:04

Dane, just a quick post for now. Yes, right, they'll select "Node" mailbox and this mailbox will then have a feeds importer for handling the nodes, and a feeds importer for handling the comments. The usability on these forms is pretty bad right now because there's no form messages or redirection. Basic mailbox is like if someone wanted to write another module or feeds plugins which does something besides create nodes, but it's also the "base" for any mailbox type, containing the basic info needed to handle connections and reading of the mailbox.

For your first question, go to /import once you have an importer setup and you'll see you can choose a mailbox to import from. Or, if you attached the feeds importer to a node type, you'd create a new node of that type, on which you'll have a select box to choose which mailbox it imports from.

The concepts are all a little disconnected, but they're very powerful when combined.

Comment #10

danepowell commented 12 January 2011 at 22:10

Great, thanks! Yeah, I just needed to RTFM to figure out (1). I'm pretty confident that I can get a patch for you to review shortly.

Comment #11

danepowell commented 13 January 2011 at 01:12

Status:

Active

» Needs review

Status	File	Size
new	mailhandler-947266-11.patch	6.77 KB

Okay, here's a patch for review. This allows modules to implement "filter" callbacks, and includes three default filters (all, nodes only, comments only). These filters can be selected on an import node. A few notes:

This is my first time working with CTools- please double check that everything looks correct, and go easy on me if I screwed anything up :)
Currently, if a message type is rejected, the entire import process will fail, because there's no way that I can tell to simply leave a message alone. Take a look at the TODO notes in mailhandler_retrieve_message.
The plugins/functions relating to comments might need to be moved into the mailcomment module
We still need to figure out the rest of the comment-posting process. I think we should be able to write a comment parser (using the current guts of Mailcomment) that will feed into the Feeds Comment Processor.

Comment #12

Ian Ward commented 13 January 2011 at 03:25

Dane, this is great. My specific thoughts on your comments:

#1 I think the usage of functions instead of classes is fine, for the filter callbacks. It's simple.

#2 what if in mailhandler_retrieve, the $messages array checks whether mailhandler_retrieve_message returns a message or false? I'm not positive what actually marks a message as read in the imap functions - imap_delete will mark it for deletion, but, marking it simply as "read" I'm not sure, so will require some investigation. Keep in mind this should be tested for POP and IMAP connections because POP message ids versus imap id's work differently and behave differently between connections. For example, from http://php.net/manual/en/function.imap-delete.php "POP3 mailboxes do not have their message flags saved between connections, so imap_expunge() must be called during the same connection in order for messages marked for deletion to actually be purged. "

#3 I think they could stay in mailhandler.

#4 Yes, mailcomment can probably be converted to work with this pretty easily.

Additionally, what do you think about putting the select box for which filter to use on the configForm instead of the sourceForm, so that the settings is saved per feeds importer as opposed to per-instance of that feeds importer (like on a node which that feeds importer is attached to, which is what happens with sourceForm).

Comment #13

Ian Ward commented 13 January 2011 at 03:27

Another thought on your #2 in my comment #12 - the testing should also test when multiple sessions are used to import (where batch actually makes multiple connections to the mailbox) since this is where different behavior occurs between POP and IMAP.

Comment #14

danepowell commented 13 January 2011 at 17:02

Status	File	Size
new	mailhandler-947266-14.patch	5.7 KB

First of all, I apologize that I accidentally added the filters folder to the repository, and I can't figure out how to delete it. I'm so used to working with SVN, I didn't realize that adding the folder in CVS (to make a patch properly) would add it to the repository.

Anyway- this patch moves the configuration to the per-feed config form.

mailhandler_retrieve now checks if a message or FALSE is returned. I've tested this with every combination of filter selected (nodes only, comments only...), type of emails present (nodes, comments), mailbox set to delete or simply mark as read, and IMAP/POP. I haven't checked with "multiple sessions"- I'm not exactly sure what you mean by that.

However, I don't see why there'd be a problem. I'm not too familiar with IMAP/POP PHP functions, but as I understand it,

mailhandler_retrieve opens the mailbox (shouldn't affect read or unread status of messages)
mailhandler_retrieve_message checks the headers and returns FALSE if ignoring message. If ignored, the read/unread status of the message shouldn't change when the mailbox is closed/expunged later.
mailhandler_get_parts actually fetches the parts of the message. This will mark it as read, assuming the mailbox is closed properly later.
mailhandler_retrieve_message checks for an empty message. It returns FALSE if so, AND marks the message for deletion. The mailbox will still get closed/expunged later.

Finally, like I mentioned, I'm not sure how to add the filters folder to this patch. However, nothing has changed there since the last patch so you can pull the files from that if it's not too much trouble.

Comment #15

danepowell commented 13 January 2011 at 20:20

Just FYI, I'm currently writing a MailcommentParser for Mailcomment 2.x (whose parent is MailhandlerParser). This parser extracts the in_reply_to string from the headers and transforms it into a Parent NID / Parent CID, which can be mapped to the existing Feeds Comment Processor.

In order to handle attachments, I think the best solution is to patch Feeds Comment Processor (if necessary).

Comment #16

danepowell commented 13 January 2011 at 20:31

Actually, I see now that a lot of the code necessary to parse comments is already included in MailcommentAuthenticate... so maybe a separate parser isn't necessary? I think I'm going to stop here and wait for some direction... :)

Comment #17

Ian Ward commented 14 January 2011 at 01:13

Status:

Needs review

» Fixed

Committed! Thanks Dane. I tested multiple sessions. What I meant by that was that multiple batch requests are made. I did this by turning down FEEDS_NODE_BATCH_SIZE to just 2, from 50, so that the multiple batch requests would be made when testing with 5 messages, 3 of which have in-reply-to headers and 2 which don't. Worked fine with both POP and IMAP. I believe the key is what you say above, that the stream is closed an messages read are expunged. There used to be problems with when a new stream was opened and closed for every single message.

About MailComment - see how in MailhandlerNodeProcessor uses mailhandler_node_feeds_node_processor_targets_alter and adds in_reply_to. Perhaps this can be removed, and worked into FeedsCommentProcessor. That said, MailComment has a particular way of checking permission to actually post the comment based on the headers in the signature, as you know.

It may be worth considering whether MailhandlerNodeProcessor could happen in MailhandlerParser, and whether that would let comment processing and node processing importers both use the ::authenticate() plugins. I'm a little distracted to think more about this right now, but will think about it later. Can you open a new issue w/ the path you're seeing?

Comment #18

Ian Ward commented 14 January 2011 at 03:31

Thinking about this a little more, I guess there are three options:

Option 1) Move authentication ::authenticate() into MailhandlerParser. Of all the existing authentication plugins, they all just need header information, and to be able to set uid. Uid could be set as a source, and then mapped in the processor mapping. Authenticating in the parser does feel a little strange, but it should work ok. I mean, you wouldn't prevent any processor from acting on all parsed messages - just turn off authentication in the parser if that's what you want. This would then let MailhandlerNodeProcessor and FeedsCommentParser perhaps use the same parser - MailhandlerParser, when trying to handle comments ala MailComment-style.

Option 2) Create a MailhandlerProcessor from which MailhandlerNodeProcessor and optionally FeedsCommentProcessor inherit. May not save much code or make a ton of sense.

Option 3) Implement an authenticate() into FeedsCommentProcessor, in which case it make sense to just make a separate MailhandlerCommentProcessor (or MailCommentCommentProcessor) since we'd be getting a little more implementation-specific than FeedsCommentProcessor should be.

I'm leaning toward #1, except for the "feels a little strange" part... but then again, I don't think any other feeds "source" type implements any kind of authentication based on parameters from parsed information, so, there's not much precedence.

Comment #19

danepowell commented 17 January 2011 at 02:43

I don't know about you, but just thinking about this makes my brain hurt. I created a new issue for the issues that you're talking about, please lend your wisened opinion: #1029352: Cleaning up authentication, and other high-level restructuring

This doesn't completely cover the topic of how to handle comments (specifically extracting threading information for the Feeds Comment Processor), but I think that will shake out in the process of figuring out authentication, since authentication and extraction of the ID tag/threading info currently take place simultaneously.

Comment #20

danepowell commented 30 January 2011 at 23:10

Issue tags:

+mailflow 2.x

Tagging to keep track of issues related to the mailhandler-2.x/notifications-4.x workflow

Comment #21

13 February 2011 at 23:20

Status:	Fixed	» Closed (fixed)
Issue tags:	-mailflow 2.x

Automatically closed -- issue fixed for 2 weeks with no activity.

Comment #22