Released module?
rcross - October 17, 2007 - 03:31
| Project: | Mailing List Archive |
| Component: | Code |
| Category: | support request |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | closed |
Jump to:
Description
Hey,
I don't remember how I stumbled upon this project, but I notice that there doesn't seem to be an official release yet. How stable is the dev code? Also, curious if this depends on the mailhandler module and what kind of performance issues this has for a site since mailing lists can get quite busy at times.
Thanks.

#1
I have not yet branched an official release. I'm in no real hurry to do so, either, sorry.
Stable enough to use on my own website. But I have a lot of plans still, and the code needs a lot of cleanup as well as removing anything that may be KernelTrap-centric.
No, it is standalone.
Actually downloading messages is handled by cron. You can configure how many messages from each list are downloaded each time cron is run. There is overhead, but by running cron frequently and only downloading a handful of messages each time, you hardly notice it. As the lists grow, there is greater and greater overhead. I have lists with >100,000 messages, and they take more CPU power than lists with say < 20,000 messages. Performance will also depend on how much RAM your database server has. I'm using a dedicated server, and it does max it at times.
Actually browsing mailing lists is more intensive than adding messages to them. Again, with smaller mailing lists you'll hardly notice it. But as they grow beyond what can fit into your database server's RAM, it can slow things down.
I have a mailarchive_cache module in my development sandbox. It's not fully functional yet, but it should greatly help browsing performance on large mailing lists. I hope to find time to finish it sooner than later. (If I start seeing serious performance degradation on my website, it will obviously become a priority for me.)
#2
Sad to hear you won't be making a release branch anytime soon. Just a little clarification, from your comments it sounds like you're doing something special with the messages. I assumed you were just taking each message and converting it into a node - detecting threads via subject headings and probably either using the forum module for display or something similar. Once they are imported into drupal - i wouldn't expect a difference browsing a small mailing list vs a large one since the site as a whole would still have the same total number of nodes in the db. Is this incorrect?
Also, do you have any other suggestions for modules/ways of integrating mailing list archives with a drupal site?
#3
Guess I should've also asked how much work it would take to clean up the code enough to make a decent dev release. I haven't looked at the code yet, but maybe I can help some.
#4
As I noted, the development branch is usable. I try and only check in functional code. I have no incentive to roll an official release before I'm ready, it's easier at this point when so much is happening to have only one branch to be focused on.
Mail archive subscriptions are nodes. The messages within are stored in their own "mailarchive_messages" table which holds a lot of custom data. Some of the header information is stored in "mailarchive_messages_addresses". Attachment information is stored in "mailarchive_messages_attachments".
Threading is done in the same way as your mail client offers threading, utilizing the "message_id" and other mail headers contained in each message. Display of subscriptions is much like forums. Display of actually messages is custom. Follow the earlier link I provided to see what it looks like.
I don't follow you. By small mailing list, I mean one with few messages. By a large mailing list, I mean one with lots of messages. Thus, a large mailing list will have a much larger database. The slowdown comes from fancy features such as the ability to navigate through mail archives by thread, date or author, as well as displaying blocks showing the complete thread. As the list grows, this involves increasingly intensive queries.
I suggest you 1) follow the earlier link to my web page so you can see what the mail archive looks like, and 2) download the code and take a look at how things are laid out. At least look at the schema, that will answer some of your more basic questions.
I wrote this module to satisfy my own needs for exactly that.
Patches are always welcome and very much appreciated. As for rolling a release, I am not going to rush into that. The core functionality is there, but I've lots of features I want to add yet. Rolling a release simply means people will start filing bugs for things that more often than not don't affect me -- unfortunately I don't have time for that yet. Once I've finished my personal TODO list, then I'll roll a release. As for how long that will take, it depends on how much time I find to work on this.
#5
Have you considered storing the data (the mails) in specially crafted cck nodes ?
By using nodes, you can use more standard drupal-features (creating views, rss-feeds, get the mails indexed for the searchengine, etc), and it might have a positive impact on performance too. (although I see you have 100.000+ messages stored... that's impressive!)
#6
That doesn't seem like a good idea to me. I'm not a follower of the philosophy that everything should be a node. Certainly it makes sense for some things, but not for everything. I do not see any compelling reason for mail messages to be nodes.
Just as comments can be displayed through views, so could mailarchive messages. It wouldn't be that difficult to integrate. You would start with the fact that each mailarchive subscription is a node, then build your views tables from there...
That's what mailarchive_rss provides. The feeds provided by this simple module are already very comprehensive. Feeds of maliing lists, of individual threads, of individual authors, of a given subject...
There's a patch in the issue queue that offers just that, which I fully plan to merge. Content doesn't have to be a node to be indexed for Drupal's search. (My biggest concern here is the inevitable size of the search tables that will be created from large mailing lists)
Why do you think that? The mailarchive tables are designed specifically for storing mail messages. They will perform better than generic tables designed for any data type.
There are currently 350,000+ messages in my mail archive on KernelTrap, with 110,000+ being the largest individual mailing list archived. I've done some profiling, and started working on a mailarchive_cache module which I expect will allow me to scale up by at least a factor of ten on the existing infrastructure. I hope to have time to finish it soon, as once deployed the mailarchive pages should be lightening fast.
#7
Thank you for your extensive answer, and for the pointers to the patch and the module.
I am a strong believer of 'everything should be a node', but you are right: with 350.000+ messages stored, making all these mails nodes would not be a good solution; and it certainly would not be faster.
In my particular case, I only have about 10 mails a day that I want to process (it is a mailinglist that receives mails from security-programs that keep an eye on our servers: logwatch and ossec notifications); so that is a completely different case.
As a solution to get these mails into nodes, I decided to use a tool to generate an RSS-feed of my mails; and with the leech.module, I get them into nodes. Now I can use taxonomy-access out of the box, add extra fields to the content type and add a workflow, and actions.
These are of course not standard features for a mailing list archive; and that's what your modules seems to provide very well.
#8
Automatically closed -- issue fixed for two weeks with no activity.