Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
No documentation on how to get this up and running, some help would be good.
Comment | File | Size | Author |
---|---|---|---|
#12 | Screenshot.png | 142.93 KB | selvaraj123 |
#12 | Screenshot-1.png | 134.5 KB | selvaraj123 |
#12 | Screenshot-2.png | 130.48 KB | selvaraj123 |
Comments
Comment #1
curagea CreditAttribution: curagea commentedSeconded. My helpers are installed, but I can't see to get Search Files working with them. Some documentation will be greatly appreciated.
Comment #2
mgiffordYes, I'd really like a README.txt file too. However, this is as much as I've hammered out:
ON SERVER WITH THE COMMAND LINE
To Install from Debian/Ubuntu:
# apt-get install xpdf
# apt-get install catdoc
# apt-get install unrtf
Help Options available:
$ /usr/bin/env pdftotext
pdftotext version 3.01
Copyright 1996-2005 Glyph & Cog, LLC
Usage: pdftotext [options] []
-f : first page to convert
-l : last page to convert
-layout : maintain original physical layout
-raw : keep strings in content stream order
-htmlmeta : generate a simple HTML file, including the meta information
-enc : output text encoding name
-eol : output end-of-line convention (unix, dos, or mac)
-nopgbrk : don't insert page breaks between pages
-opw : owner password (for encrypted files)
-upw : user password (for encrypted files)
-q : don't print any messages or errors
-cfg : configuration file to use in place of .xpdfrc
-v : print copyright and version info
-h : print usage information
-help : print usage information
--help : print usage information
-? : print usage information
$ catdoc
Usage:
catdoc [-vu8btawxlV] [-m number] [-s charset] [-d charset] [ -f format] files
$ unrtf
Usage: unrtf [--version] [--help] [--nopict|-n] [--html] [--text] [--vt] [--latex] [--ps] [--wpml] [-t html|text|vt|latex|ps|wpml]
IN DRUPAL
Set the Helper Files & extensions - admin/settings/search_files/helpers/
Word & Excel Files
HELPER NAME: Microsoft Word
EXTENSION:
HELPER PATH: /usr/bin/env catdoc %file%
HELPER NAME: Microsoft Excel
EXTENSION: xls
HELPER PATH: /usr/bin/env catdoc %file%
HELPER NAME: RTF Files
EXTENSION: rtf
HELPER PATH: /usr/bin/env unrtf %file%
Set the Valid Directories -- admin/settings/search_files/directories
sites/example.com/files
Comment #3
mmirza CreditAttribution: mmirza commentedHi, I am having some real problems with setting up the search files modules, I've followed all the steps from the post, and yet nothing, can someone please help?
Comment #4
--David-- CreditAttribution: --David-- commentedMe too... please help?
Comment #5
zaarkov CreditAttribution: zaarkov commentedhad trouble too,
now i'm using 6.x-2.0-beta4, which does the job very basic.
Comment #6
airliner CreditAttribution: airliner commentedJust download, extract, change into directory of the helper.
Call ./configure -C "your_path" --> make --> make install and let auto detect by 6.x-2.0-beta4.
With Debian everything is ok.
But just search attachments, not searching in directories, but that's ok imho.
Comment #7
SocialNicheGuru CreditAttribution: SocialNicheGuru commentedthis hsould go in the readme.
Comment #8
apatrinos CreditAttribution: apatrinos commentedOn MacOS pdftotext requires a '-' as the last argument in order to output its results to the terminal and consequently to a php variable via the shell_exec call in function search_files_attachments_get_file_contents of file search-files_attachments.module. If this is general it should probably be incorporated in the documentation. Unfortunately this is not mentioned in pdftotext's help output, but it is the usual behavior for unix tools.
Comment #9
terryallan CreditAttribution: terryallan commentedThanks for all comments above but the instructions are not clear enough for me yet.
I have installed the extracted catdoc app in a directory called helpers in the search_files module directory. ie search_files/helpers/catdoc
In Admin/Site Configuration/Search Files no helper apps are listed and so no configuration is possible.
Can anyone advise me please?
Thanks
Comment #10
stodge CreditAttribution: stodge commentedI'm having the same problem. I have it all configured and the helpers installed. I attached a .txt file and a .pdf file to new content. I re-index the search but I don't get any hits when searching. Any suggestions appreciated. Thanks
Comment #11
mdallmeyer CreditAttribution: mdallmeyer commentedHi, I just wanted to stick my head in and say that I got my search files module to work great using .jar's I wrote using the Apache POI Project.Here is a link to the jar file I wrote which will extract text from .doc, .ppt, .xls. Alternatively, here is a wrapper .exe file, although I could not get this one to work, it had trouble finding the JRE.
Seems Apache Tika released a jar which does this for all MS Office files, including docx, xlsx, pptx, etc. (More links in case mirror dies)
I had to copy the JRE from the JDK to a directory on the server and then for the helper app line I wrote
"E://folder/folder/folder/jre7/bin/java -jar E://folder/folder/folder/MSOfficeToText.jar %file%"
Comment #12
selvaraj123 CreditAttribution: selvaraj123 commented#2 i have followed all steps.no showing any search file result
Comment #13
ge CreditAttribution: ge commentedIn /admin/settings/search_files/helpers/edit/1 your screenshot shows the 'Helper path' setting as:
/usr/bin/env pdftotext %file% -
This would not be a valid path. It has a space in the path.
If pdftotext is in /usr/bin (like it is on my server), the setting would be:
/usr/bin/pdftotext %file% -
If pdftotext on your server is in /usr/bin/env (doubtful), then the setting would be:
/usr/bin/env/pdftotext %file% -
Genny
Comment #14
prabakaran CreditAttribution: prabakaran commentedthe module working is we will but new Microsoft word file not indexing in exampls (.docx, .pptx and .xlsx ) file not indexing
please help me.