I'm looking at the possibility of using Drupal for a company Intranet (please note - Google is no good to me). I have some knowledge of Drupal already, but I am somewhat concerned about the search capabilities. Here is what I need to be able to do:

  • The site is basically a DMS intended to allow about 1-2000 users easy online access to 14,000 documents (Word, Excel, Powerpoint, PDF, possibly moving to OpenOffice support later), rising to 50,000 documents in the next year or two. Can Drupal handle this from the performance viewpoint?
  • Each document is accompanied by a substantial amount of metadata. At the moment, it looks to me like I would have to configure the metadata in Flexinode or CCK
  • The documents will be classified and users able to search by category, or by search
  • It must be possible to search via metadata fields or by fulltext search in the document file itself
  • I want to be able to export a complete "tree" of data (eg select a category and exportall the documents (metadata and document file) to an offline web page(s)

So here are my questions:

  • Can Drupal handle the volumes from the performance viewpoint?
  • Can the search engine or a module search the metadata (ie does it support searching on particular CCK fields)?
  • Can the search engine or a module do fulltext search on the document files themselves?
  • Does Drupal support Dublin Core, if so how?

My thanks to the Drupallers for your thoughts on this.

Comments

sin’s picture

http://drupal.org/node/73486 (indexing attachments, see swish-e module also)
http://drupal.org/project/relationship (i've seen some Dublin Core support in it)

markj’s picture

I've been working on a similar set of requirements for using drupal as a framework for a digital library management system -- see http://drupalib.interoperating.info/node/29 for more info. I've already got a prototype "advanced" search module for cck content types, which I'll make public in a few weeks.

mike stewart’s picture

I'd be interested in knowing what you decided, and why? Or anyone else's experience?

robmilne’s picture

WebFM can help you and it already is being used largely as you suggested. Currently the metadata support isn't great however the addition of views could expose the potential of WebFM's metadata. Search inside documents would still have to handled outside of WebFM (drupal search doesn't do that AFAIK, my old workplace used mnogo for that purpose). Conformance to metadata standards will improve at a latter date when I get around to making separate metadata tables that are scalable/configurable. Today the fields are fixed and inside the files table.