in mailarchive.module, _mailarchive_message_save(), when a message has no parent (line 915), there are 3 queries to determine the previous max thread_id.

Maybe I'm missing something, but wouldn't it be easier to do this:

$previous_id = db_result(db_query('SELECT MAX(thread_id) FROM {mailarchive_messages} WHERE sid = %d', $subscription->sid));

Comments

jeremy’s picture

Status: Active » Closed (works as designed)

The reason for these three queries is for performance. When dealing with mail archives with more than 50,000 messages it can be absurdly inefficient to perform the query you suggest. (This was determined empirically, as originally that's what I did.) In the current implementation we only scan messages within a reasonable window of time when looking for the parent.

BartHanssens’s picture

Ah, that explains it, I'm currently using it on a very small low-traffic archive, so I didn't notice the difference in performance...

I guess the simple variant with an additional multi-column index (sid, thread_id) might also work: if I recall correctly, MySQL should optimize the MAX(thread_id) away, making it a very fast lookup. Might also work on a recent Postgresql...

But then again, I don't have any real data to support this claim, so it's just my two cents :-)

jeremy’s picture

The three queries are a result of profiling. Originally I had a single query, and it was a bottleneck for a mailing list with 100,000 messages showing up in 'SHOW FULL PROCESSLIST'. The single query was fast until there were enough messages that the server couldn't hold everything in memory anymore, then temp tables were slowing everything down. There are some other bottlenecks I still have to optimize, but as they're much more rare it's not quite as much of a priority.

How's the archive working for you? Is it a public site? I'd be curious to see it in action...

BartHanssens’s picture

Title: mailarchive.module, shorter way to get threadid when message has no parent » mailarchive in action

Sure, it's on http://www.opengov.be/en/mailarchive

As you can see, the archive is still very small :-) It isn't the most recent version, so I had to make a few changes (mail obfuscating and html-tag removing), but it was very easy to set up and it works like a charm.

The most time-consuming part (well.... maybe 15 minutes) was getting the mails from the existing majordomo archive into the mailarchive. Copying a mailbox and installing dovecot pop3 server on my home machine did the trick.

jeremy’s picture

Sure, it's on http://www.opengov.be/en/mailarchive

"As you can see, the archive is still very small :-)

Excellent! I'm glad to see that you were able to get it up and running. A few years ago I wrote a version that was all but impossible to install... ;)

Did you ever get the issue with automatically downloading messages solved (I looked through your archives).

"It isn't the most recent version, so I had to make a few changes (mail obfuscating and html-tag removing), but it was very easy to set up and it works like a charm."

Any reason you're running an older version? Upgrades should be trivial, and the latest version is highly recommended over older versions.

HTML tag removing should be easily done with standard Drupal filters -- no coding required.

If you implemented mail obfuscation as a standard Drupal filter, it would be great to attach it as a patch in the appropriate issue. I intend to do so one of these days, simply haven't gotten around to it yet. I've been more focused on improving performance.

Thanks for the link.

BartHanssens’s picture

Did you ever get the issue with automatically downloading messages solved (I looked through your archives).

Turned out to be a non-issue, since you already implemented a cron hook, and one of the webmasters (the site is run by a few volunteers) pointed me out to poormanscron.

Any reason you're running an older version? Upgrades should be trivial, and the latest version is highly recommended over older versions.

No reason in particular, the old version worked just fine, so why upgrade ? :-)
Anyway, I installed the latest CVS version today and I noticed RSS and statistics have been added, nice work :-)

Not being a drupal expert, I've implemented hook_search and hook_update_index, you can see the results on http://www.opengov.be/en/search/mailarchive (search for 'alfresco')

It still needs some work, but I'll be happy to contribute the code.

jeremy’s picture

"Not being a drupal expert, I've implemented hook_search and hook_update_index, you can see the results on http://www.opengov.be/en/search/mailarchive (search for 'alfresco')

"It still needs some work, but I'll be happy to contribute the code.

Very nice! Yes, please, open a new issue and attach your patch. Contributions are very much appreciated! Search has been on my todo list, but I haven't had the time. (And I worry about how it's going to swell my search indexes, perhaps to the point of being unusable.)