I've written a module that allows searching of the text within PDF, MS Word, and plain text files attached to nodes. The module uses helper apps such as pdftotext and catdoc to extract text which is then appended to the parent node's record in Drupal 4.7's search_dataset table. See the background discussion at http://drupal.org/node/71215 .

Comments

ethanzhong’s picture

How to download it? I went to your website. I didn't see the attachment. I could not register either. Please help

markj’s picture

OK, the file is attached to my page now. You can't register, only leave comments.

B.GARBE’s picture

I can't find the module on your site.

kbahey’s picture

Swish-E Indexer does that http://drupal.org/node/16428
--
Drupal development and customization: 2bits.com
Personal: Baheyeldin.com

--
Drupal performance tuning and optimization, hosting, development, and consulting: 2bits.com, Inc. and Twitter at: @2bits
Personal blog: Ba

markj’s picture

But you also have to install swish-e plus the helper apps. My module is a bit more light weight, FWIW.

digirave2’s picture

right now i was searching on tips how to develop the EXACT thing myself.

yeah, you can use a external search engine, but for one thing i need multibyte character search capabilities which swish doesn't support, and although drupal sucks for multibyte character search it does work somewhat.

also, this is MUCH more lightweight, and easier to modify to my liking.

thanks to the markj!

i'm going to try this module out~