Using a php script instead of external helper

francoud - January 9, 2008 - 08:16
Project:Search attachments
Version:5.x-3.0
Component:Code
Category:support request
Priority:normal
Assigned:Unassigned
Status:active
Description

Search_attachments is a great module, and it solves lot of problems. Using shell_exec to execute an external program is a great idea; but sometime, it's impossible to use an external helper: often hosting providers simply don't allow to install/use external programs (expecially when php safe mode is on).

Really, sometime a basic text-conversion job can be made by some simple php script - e.g. I've found a couple of php functions to convert a pdf to a text string, and they work. I wonder if there is a way to tell search_attachments to use this php functions directly from php, instead of invoking an external program. I would really appreciate any suggestion.

Thanks

FB

#1

markj - January 9, 2008 - 15:51

Hi Francoud,

Nice suggestion. To allow the use of PHP functions instead of external helpers, we'd have to allow admins to paste PHP code into the helper config entries and then tell the module that it should use that code to parse the target files. That feature is easy enough to add.

What PDF functions are you referring to? It's not apparent to me which ones get the text of a PDF file.

Mark

#2

francoud - January 11, 2008 - 08:49

Not a standard php function - just a couple of user made functions that I found in Internet. For example look for "pdf2string" in: http://it2.php.net/pdf or also in http://www.sitepoint.com/forums/showthread.php?p=3675665

They work but maybe they need to be modified. But, where I cant use external helpers... it's a good starting point ;)

#3

markj - January 11, 2008 - 15:42

Thanks for the pointer. I'll take a look at adding this feature, maybe using the code you found and a simple example for parsing text files. It might not make it into 5.x-4 (which is a major overhaul of the module and should be available for testing in the next week or two) but for sure into 5-x.5 (which will probably also become 6-x.1).

#4

markj - February 24, 2008 - 18:27
Version:<none>» 5.x-3.0

In 5.x-4-dev, I've included a pure PHP helper that reads a text file and prints its contents. This is not quite what you were asking for, since it is still an external helper. However, this file might be useful to people who are running search_attachments on platforms that don't have a "cat" command (like Windows), and it also serves as an example of writing other pure PHP external helpers. I still like your idea of storing the "helper" PHP in the database, and will likely include it in 5.x-5 or the first 6.x version of the module.

#5

markj - February 28, 2008 - 16:40

Hi Francoud,

After considering the security implications of bundling PHP scripts with the module (see comment #4), I've added the ability to use PHP code to extract file text. This is working in my development version, which I'll make available by the end of the weekend. Basically, the PHP snippet gets the current file path from a global variable, extracts and/or processes the text of the file, and returns the text.

#6

francoud - March 6, 2008 - 11:58

Really thanks.. I will try all :)

#7

markj - March 6, 2008 - 16:08

Version with this feature (5.x-4-dev-2008-03-05) just made available at http://interoperating.info/mark/node/74

#8

Marcin Pajdzik - April 2, 2008 - 03:51

I really like this idea. I have managed to get the pdf parser work: http://www.sitepoint.com/forums/showthread.php?p=3675665 Does anybody know any php script that would be able to pull out text from Ms Office files (doc, xls, ppt) ?

#9

markj - April 22, 2008 - 04:26

The snippet at http://www.mousewhisperer.co.uk/php_page.html works on Word and PPT files, although with the latter it also extracts a lot of formatting information. Can't get it to work with Excel files though.

 
 

Drupal is a registered trademark of Dries Buytaert.