By ryyz on
will i be able to search my website with the search function for keywords that are part of attachments such as .doc and .pdf?... thank you
will i be able to search my website with the search function for keywords that are part of attachments such as .doc and .pdf?... thank you
Comments
Potentially useful feature
I notice someone asked about this back in April but there were no responses. I don't see any indication that Drupal can do this, but there are linux utilities like pdftotext (for PDF) and catdoc (for MS Word) that can extract the text of the binary files. Not sure if these are available for Windows or if there are equivalents. A simple solution to your problem would be to find all the attachments on a node, extract the text of the attachements, and add it to the node's entry in the search_dataset table. This would provide hits within the node and all attachments, which is crude but at least it would be a place to start.
I'm not that familiar with the search module but I'll take a look to see if this is possible.
thanks for the things to think about...
thanks markj..... i'll try a search with a .doc attacmwnt.... curious.. thank u again
OK, this looks promising
I've got a skeletal module that attaches some test text to each node's search data as part of the normal indexing process:
The next step is to identify the attachments for each module, extract the text for each, and add the text to the parent node's search data. I'll work on this over the next week or so... if all goes right, the result will be a module that will index .doc and .pdf files and merge their text with that of the parent node, so that search hits in the attached text wil return the parent node. We'll go from there. The module will only work if you have the extractor utilities (catdoc, pdftotext) installed, however.
Module almost done
I've got this working on PDFs, Word, and text files. I'm still in the code cleanup/documentation stage, and I'd also like to add some basic form validation, like telling you if the module can't find the helper apps you specify. Stay tuned...
great stuff!
markj.... this is great... i am still new to drupal and have no experience with php either (i'm an ole legacy guy).... can u include a 'how to' like what have to be added to drupal (any external /additional modules)...???? many thanks.
very interesting
This sounds like a very interesting module. I'd be happy to help test it.
OK, I'll get it to you tonight
I think it's ready to test. You'll have to install pdftotext (http://www.bluem.net/downloads/pdftotext_en/ for OS X package; can't seem to find the linux version at the moment) and catdoc (http://www.45.free.net/~vitus/software/catdoc/ for source) in order to use the search_attachments.module, or if you are on a system that comes with 'cat' installed, you could just test it with .txt attachments without installing any additional helper applications.
Email me through my Drupal contact form and I'll send you the module later tonight. It would be nice to have a couple of people test it prior to making a general annoucement.
maybe swish-e wll work?
maybe swish-e module might work.......
Module now available
I've packaged search_attachments.module up and made it available at http://interoperating.info/mark/search_attachments
Thanks to all who assisted.
This works for me! Awesome
This works for me! Awesome and thank you, this saves me SO much time.
---------------------------------------------------------------
http://xamox.NET
New module to search attachments.
http://drupal.org/project/search_attachments
Nice! This was the last
Nice! This was the last thing I needed to make my site perfekt
Thanks
Edit: Nooo =( Please make your release avaiable from drupal.org