Use real path instead of url to retrieve file [#1148162]

Comment	File	Size	Author
#10	search_api_attachments-realpath-1148162-10.patch	2.12 KB	svendecabooter
#4	search_api_attachments-realpath-1148162-4.patch	1.89 KB	floeschie
	realpath.patch	772 bytes	jax

Comment #1

Anonymous (not verified) commented 6 May 2011 at 11:16

Hi

Have you got any numbers about the preformance gain?
I'll try to set it up this weekend and test the patch.

Regards

Tim

Log in or register to post comments

Comment #2

jax commented 10 May 2011 at 16:01

Another reason for using realpath is that at the moment the indexing doesn't work with the "drush core-cron" command. The host part is missing from the url.

Log in or register to post comments

Comment #3

Anonymous (not verified) commented 12 May 2011 at 09:48

Hi

I've run some test with realpath and there is no real performance gain
for indexing 34 large documents they both took around 1 minute 31 seconds, give or take 1 second.

So I don't think we should change it for the performance gain, but maybe try to solve for drush part.

What's your idea...

Tim

Log in or register to post comments

Comment #4

floeschie commented 9 December 2011 at 12:49

Status:

Needs work

» Needs review

Status	File	Size
new	search_api_attachments-realpath-1148162-4.patch	1.89 KB

I'm having the same issue with "drush cron". I applied the patch above as well the one I submitted in #1365148. Then I cleared the index and ran the "drush cron" command. I still get errors for files which are not parsed by Tika (images, plaintext).

So I extended the code a bit mor and added a new function which determines a file's realpath.

Log in or register to post comments

Comment #5

floeschie commented 9 December 2011 at 13:27

Seems to work except that German umlauts are removed by escapeshellarg(). If a file has something like "ä ö ü" in its name, then Tika throws an exception cause it cannot find a file after the umlauts have been removed...

Log in or register to post comments

Comment #6

berdir

German

Switzerland

commented 9 December 2011 at 16:36

Stream wrappers can register if they are remote or local using the type in hook_stream_wrappers(): http://api.drupal.org/api/drupal/modules--system--system.api.php/functio...

The function added in the patch should probably check the type of the of the scheme and if it's local, use realpath() and otherwise getExternalPath(). This allows to support both private:// (which currently doesn't work if anon users don't have access) and remote wrappers like s3://.

Log in or register to post comments

Comment #7

wwhurley commented 1 February 2012 at 14:39

Another option is to set your $base_url in the appropriate settings.php file. That'll also fix URLs in various emails sent by the system on cron as well if you ever run into that.

Log in or register to post comments

Comment #8

sinasalek commented 7 July 2012 at 08:48

Patch #4 works for me, however i'm agree with Berdir that it should be a bit smarter , i had a look at http://api.drupal.org/api/drupal/includes%21stream_wrappers.inc/7 but couldn't find anything that clearly indicated whether a wrapper is local or remote. If that's the case then a workaround should be used.
One solution can be check $wrapper class to see if it's a subclass of DrupalLocalStreamWrapper

Log in or register to post comments

Comment #9

mfb

they or he

commented 8 July 2012 at 17:46

 $local_wrappers = file_get_stream_wrappers(STREAM_WRAPPERS_LOCAL);
 if (isset($local_wrappers[$scheme])) {
 // then it's local

(from image.gd.inc)

Log in or register to post comments

Comment #10

svendecabooter

he/him

Dutch

Gent

commented 1 August 2012 at 09:26

Status	File	Size
new	search_api_attachments-realpath-1148162-10.patch	2.12 KB

Attached is a new version of patch #4 based on the feedback by Berdir & mfb

Log in or register to post comments

Comment #11

osopolar

German

🇩🇪 GER 🌐

commented 18 September 2012 at 14:24

Patch in #10 works for me, Thanks.

Log in or register to post comments

Comment #12

sinasalek commented 18 September 2012 at 19:45

Status:

Needs review

» Reviewed & tested by the community

Haven't test patch #10 but the approach looks reasonable to me too.
@osopolar if it works fine for you , you could mark it as RTBC, i'm doing on behalf of you

Log in or register to post comments

Comment #13

coreycondardo commented 23 September 2012 at 14:01

I applied this patch and it seems to reduce the number of errors the drush command to index the site produces however I'm getting a lot of this..

INFO - unsupported/disabled operation: EI

What is that?

--Corey

Log in or register to post comments

Comment #14

apanag commented 6 February 2013 at 07:40

Patch worked for me. However my case was a password protected site, so every time extract was getting a 401 error code.
Using the realpath solved the problem, because no URL is required anymore.

Also some "INFO - unsupported/disabled operation: EI" messages were shown, but the vast majority of the .pdfs were indexed properly.

Thank you for the patch :-)

Log in or register to post comments

Comment #15

mike503 commented 17 April 2013 at 22:37

patch works great. after hours of debugging why java was dumping with absolutely no reasonable explanation... sigh

Log in or register to post comments

Comment #16

torpy commented 26 June 2013 at 01:14

This worked brilliantly for me. I had issues where I was using a self-signed SSL certificate (on my development) server which was causing Tika to error out.

Log in or register to post comments

Comment #17

izus commented 15 July 2013 at 22:35

Assigned:

Unassigned

» izus

assigning to me for testing and very probabely merging in the 7.x-1.x branch

Log in or register to post comments

Comment #18

izus commented 25 July 2013 at 17:07

Status:

Reviewed & tested by the community

» Fixed

Hi,
just merged it in 7.x-1.x branch
Thanks all !

Log in or register to post comments

Comment #19

izus commented 25 July 2013 at 22:06

Assigned:

izus

» Unassigned

Log in or register to post comments

Comment #20

8 August 2013 at 22:11

Status:

Fixed

» Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

Log in or register to post comments

Use real path instead of url to retrieve file

Comments

Comment #1

Comment #2

Comment #3

Comment #4

Comment #5

Comment #6

Comment #7

Comment #8

Comment #9

Comment #10

Comment #11

Comment #12

Comment #13

Comment #14

Comment #15

Comment #16

Comment #17

Comment #18

Comment #19

Comment #20

News items

Our community

Documentation

Drupal code base

Governance of community