Early Bird Registration for DrupalCon Portland 2024 is open! Register by 23:59 PST on 31 March 2024, to get $100 off your ticket.
No documentation on how to get this up and running, some help would be good.
Comment | File | Size | Author |
---|---|---|---|
#12 | Screenshot.png | 142.93 KB | selvaraj123 |
#12 | Screenshot-1.png | 134.5 KB | selvaraj123 |
#12 | Screenshot-2.png | 130.48 KB | selvaraj123 |
Comments
Comment #1
curagea CreditAttribution: curagea commentedSeconded. My helpers are installed, but I can't see to get Search Files working with them. Some documentation will be greatly appreciated.
Comment #2
mgiffordYes, I'd really like a README.txt file too. However, this is as much as I've hammered out:
ON SERVER WITH THE COMMAND LINE
To Install from Debian/Ubuntu:
# apt-get install xpdf
# apt-get install catdoc
# apt-get install unrtf
Help Options available:
$ /usr/bin/env pdftotext
pdftotext version 3.01
Copyright 1996-2005 Glyph & Cog, LLC
Usage: pdftotext [options] []
-f : first page to convert
-l : last page to convert
-layout : maintain original physical layout
-raw : keep strings in content stream order
-htmlmeta : generate a simple HTML file, including the meta information
-enc : output text encoding name
-eol : output end-of-line convention (unix, dos, or mac)
-nopgbrk : don't insert page breaks between pages
-opw : owner password (for encrypted files)
-upw : user password (for encrypted files)
-q : don't print any messages or errors
-cfg : configuration file to use in place of .xpdfrc
-v : print copyright and version info
-h : print usage information
-help : print usage information
--help : print usage information
-? : print usage information
$ catdoc
Usage:
catdoc [-vu8btawxlV] [-m number] [-s charset] [-d charset] [ -f format] files
$ unrtf
Usage: unrtf [--version] [--help] [--nopict|-n] [--html] [--text] [--vt] [--latex] [--ps] [--wpml] [-t html|text|vt|latex|ps|wpml]
IN DRUPAL
Set the Helper Files & extensions - admin/settings/search_files/helpers/
Word & Excel Files
HELPER NAME: Microsoft Word
EXTENSION:
HELPER PATH: /usr/bin/env catdoc %file%
HELPER NAME: Microsoft Excel
EXTENSION: xls
HELPER PATH: /usr/bin/env catdoc %file%
HELPER NAME: RTF Files
EXTENSION: rtf
HELPER PATH: /usr/bin/env unrtf %file%
Set the Valid Directories -- admin/settings/search_files/directories
sites/example.com/files
Comment #3
mmirza CreditAttribution: mmirza commentedHi, I am having some real problems with setting up the search files modules, I've followed all the steps from the post, and yet nothing, can someone please help?
Comment #4
--David-- CreditAttribution: --David-- commentedMe too... please help?
Comment #5
zaarkov CreditAttribution: zaarkov commentedhad trouble too,
now i'm using 6.x-2.0-beta4, which does the job very basic.
Comment #6
airliner CreditAttribution: airliner commentedJust download, extract, change into directory of the helper.
Call ./configure -C "your_path" --> make --> make install and let auto detect by 6.x-2.0-beta4.
With Debian everything is ok.
But just search attachments, not searching in directories, but that's ok imho.
Comment #7
SocialNicheGuru CreditAttribution: SocialNicheGuru commentedthis hsould go in the readme.
Comment #8
apatrinos CreditAttribution: apatrinos commentedOn MacOS pdftotext requires a '-' as the last argument in order to output its results to the terminal and consequently to a php variable via the shell_exec call in function search_files_attachments_get_file_contents of file search-files_attachments.module. If this is general it should probably be incorporated in the documentation. Unfortunately this is not mentioned in pdftotext's help output, but it is the usual behavior for unix tools.
Comment #9
terryallan CreditAttribution: terryallan commentedThanks for all comments above but the instructions are not clear enough for me yet.
I have installed the extracted catdoc app in a directory called helpers in the search_files module directory. ie search_files/helpers/catdoc
In Admin/Site Configuration/Search Files no helper apps are listed and so no configuration is possible.
Can anyone advise me please?
Thanks
Comment #10
stodge CreditAttribution: stodge commentedI'm having the same problem. I have it all configured and the helpers installed. I attached a .txt file and a .pdf file to new content. I re-index the search but I don't get any hits when searching. Any suggestions appreciated. Thanks
Comment #11
mdallmeyer CreditAttribution: mdallmeyer commentedHi, I just wanted to stick my head in and say that I got my search files module to work great using .jar's I wrote using the Apache POI Project.Here is a link to the jar file I wrote which will extract text from .doc, .ppt, .xls. Alternatively, here is a wrapper .exe file, although I could not get this one to work, it had trouble finding the JRE.
Seems Apache Tika released a jar which does this for all MS Office files, including docx, xlsx, pptx, etc. (More links in case mirror dies)
I had to copy the JRE from the JDK to a directory on the server and then for the helper app line I wrote
"E://folder/folder/folder/jre7/bin/java -jar E://folder/folder/folder/MSOfficeToText.jar %file%"
Comment #12
selvaraj123 CreditAttribution: selvaraj123 commented#2 i have followed all steps.no showing any search file result
Comment #13
ge CreditAttribution: ge commentedIn /admin/settings/search_files/helpers/edit/1 your screenshot shows the 'Helper path' setting as:
/usr/bin/env pdftotext %file% -
This would not be a valid path. It has a space in the path.
If pdftotext is in /usr/bin (like it is on my server), the setting would be:
/usr/bin/pdftotext %file% -
If pdftotext on your server is in /usr/bin/env (doubtful), then the setting would be:
/usr/bin/env/pdftotext %file% -
Genny
Comment #14
prabakaran CreditAttribution: prabakaran commentedthe module working is we will but new Microsoft word file not indexing in exampls (.docx, .pptx and .xlsx ) file not indexing
please help me.