Download & Extend

Implement Solr search integration for mailing lists

Project:Drupal.org customizations
Version:6.x-3.x-dev
Component:Code
Category:support request
Priority:normal
Assigned:Unassigned
Status:active

Issue Summary

Implement search indexing and hopefully facets (by mailing list, author, date?) for mailing lists.

Comments

#1

Our current implementation is based on swish-e, which isn't actually built to index mailman as such...but can index text files. Mailman stores its list archives in /usr/local/mailman/archives/public on our servers and they are basically just text files, one per month. The messages are simply in the file and are in the format:

From drupal-devel at drupal.org Sun May 1 00:32:55 2005
From: drupal-devel at drupal.org (RobinMonks)
Date: Sat, 30 Apr 2005 22:32:55 -0000
Subject: [drupal-devel] [bug] /admin/node/book gives no output
In-Reply-To:
References:

Message-ID:

Message Body

OR

From sean at crushyou.net Wed May 2 04:22:27 2007
From: sean at crushyou.net (Sean)
Date: Wed, 02 May 2007 00:22:27 -0400
Subject: [support] Drupaltherapy: July 21 San Francisco
Message-ID:

Message Body

swish-e simply indexes these files and does actually facet on a couple of the headers. It runs every 15 minutes.

I plan on looking into Solr a little more in the next couple days to see how to do this with that search engine, but it seems like it would be fairly trivial.

#2

Assigned to:Anonymous» Damien Tournoud

There are basically two approaches:

  • Only index the content and point to the ugly archive interface for the result. This is easier because we don't have to store the archives ourself, we just need to index them. I have some proof of concept code that takes the HTML archives generated from mailman and index that into Solr. The down point is of course that this is not integrated, and breaks the navigation path of the user.
  • Index the content and store it inside our database, either as nodes (listhandler does that) or separately (mailarchive does that). This means storing every message in our database, which we may or may not want to do.

Anyway, assigning myself.

#3

I would do pointing out to the ugly pages, and see what people think. IMHO.

#4

Subscribing and available for consultation.

#5

Linking this from the Redesign project #661550: Meta issue for Drupal.org customizations because this issue was tagged 'drupal.org redesign'

#6

Would be nice, but I think we can live without this in the general redesign launch.

#7

Version:<none>» 6.x-3.x-dev
Category:task» support request

Does anybody have any experience making Mailman mailing list archives searchable through Apache Solr? How would you get the Mailman archives into the Solr index?

#8

Assigned to:Damien Tournoud» Anonymous

Unassigning myself.