Implement search indexing and hopefully facets (by mailing list, author, date?) for mailing lists.

Comments

nnewton’s picture

Our current implementation is based on swish-e, which isn't actually built to index mailman as such...but can index text files. Mailman stores its list archives in /usr/local/mailman/archives/public on our servers and they are basically just text files, one per month. The messages are simply in the file and are in the format:

From drupal-devel at drupal.org Sun May 1 00:32:55 2005
From: drupal-devel at drupal.org (RobinMonks)
Date: Sat, 30 Apr 2005 22:32:55 -0000
Subject: [drupal-devel] [bug] /admin/node/book gives no output
In-Reply-To:
References:


Message-ID:

Message Body

OR

From sean at crushyou.net Wed May 2 04:22:27 2007
From: sean at crushyou.net (Sean)
Date: Wed, 02 May 2007 00:22:27 -0400
Subject: [support] Drupaltherapy: July 21 San Francisco
Message-ID:

Message Body

swish-e simply indexes these files and does actually facet on a couple of the headers. It runs every 15 minutes.

I plan on looking into Solr a little more in the next couple days to see how to do this with that search engine, but it seems like it would be fairly trivial.

Damien Tournoud’s picture

Assigned: Unassigned » Damien Tournoud

There are basically two approaches:

  • Only index the content and point to the ugly archive interface for the result. This is easier because we don't have to store the archives ourself, we just need to index them. I have some proof of concept code that takes the HTML archives generated from mailman and index that into Solr. The down point is of course that this is not integrated, and breaks the navigation path of the user.
  • Index the content and store it inside our database, either as nodes (listhandler does that) or separately (mailarchive does that). This means storing every message in our database, which we may or may not want to do.

Anyway, assigning myself.

Gábor Hojtsy’s picture

I would do pointing out to the ugly pages, and see what people think. IMHO.

robertDouglass’s picture

Subscribing and available for consultation.

lisarex’s picture

Linking this from the Redesign project #661550: Meta issue for Drupal.org customizations because this issue was tagged 'drupal.org redesign'

drumm’s picture

Issue tags: -drupal.org redesign

Would be nice, but I think we can live without this in the general redesign launch.

bbaumann’s picture

Version: » 6.x-3.x-dev
Category: task » support

Does anybody have any experience making Mailman mailing list archives searchable through Apache Solr? How would you get the Mailman archives into the Solr index?

Damien Tournoud’s picture

Assigned: Damien Tournoud » Unassigned

Unassigning myself.

mgifford’s picture

Version: 6.x-3.x-dev » 7.x-3.x-dev
Issue summary: View changes

This still seems to be missing https://drupal.org/search/site/

drumm’s picture

Status: Active » Closed (won't fix)

I don't think the mailing lists have great content today.