Using a php script instead of external helper
francoud - January 9, 2008 - 08:16
| Project: | Search attachments |
| Version: | 5.x-3.0 |
| Component: | Code |
| Category: | support request |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | active |
Jump to:
Description
Search_attachments is a great module, and it solves lot of problems. Using shell_exec to execute an external program is a great idea; but sometime, it's impossible to use an external helper: often hosting providers simply don't allow to install/use external programs (expecially when php safe mode is on).
Really, sometime a basic text-conversion job can be made by some simple php script - e.g. I've found a couple of php functions to convert a pdf to a text string, and they work. I wonder if there is a way to tell search_attachments to use this php functions directly from php, instead of invoking an external program. I would really appreciate any suggestion.
Thanks
FB

#1
Hi Francoud,
Nice suggestion. To allow the use of PHP functions instead of external helpers, we'd have to allow admins to paste PHP code into the helper config entries and then tell the module that it should use that code to parse the target files. That feature is easy enough to add.
What PDF functions are you referring to? It's not apparent to me which ones get the text of a PDF file.
Mark
#2
Not a standard php function - just a couple of user made functions that I found in Internet. For example look for "pdf2string" in: http://it2.php.net/pdf or also in http://www.sitepoint.com/forums/showthread.php?p=3675665
They work but maybe they need to be modified. But, where I cant use external helpers... it's a good starting point ;)
#3
Thanks for the pointer. I'll take a look at adding this feature, maybe using the code you found and a simple example for parsing text files. It might not make it into 5.x-4 (which is a major overhaul of the module and should be available for testing in the next week or two) but for sure into 5-x.5 (which will probably also become 6-x.1).
#4
In 5.x-4-dev, I've included a pure PHP helper that reads a text file and prints its contents. This is not quite what you were asking for, since it is still an external helper. However, this file might be useful to people who are running search_attachments on platforms that don't have a "cat" command (like Windows), and it also serves as an example of writing other pure PHP external helpers. I still like your idea of storing the "helper" PHP in the database, and will likely include it in 5.x-5 or the first 6.x version of the module.
#5
Hi Francoud,
After considering the security implications of bundling PHP scripts with the module (see comment #4), I've added the ability to use PHP code to extract file text. This is working in my development version, which I'll make available by the end of the weekend. Basically, the PHP snippet gets the current file path from a global variable, extracts and/or processes the text of the file, and returns the text.
#6
Really thanks.. I will try all :)
#7
Version with this feature (5.x-4-dev-2008-03-05) just made available at http://interoperating.info/mark/node/74
#8
I really like this idea. I have managed to get the pdf parser work: http://www.sitepoint.com/forums/showthread.php?p=3675665 Does anybody know any php script that would be able to pull out text from Ms Office files (doc, xls, ppt) ?
#9
The snippet at http://www.mousewhisperer.co.uk/php_page.html works on Word and PPT files, although with the latter it also extracts a lot of formatting information. Can't get it to work with Excel files though.