Closed (fixed)
Project:
Search API attachments
Version:
7.x-1.x-dev
Component:
Code
Priority:
Normal
Category:
Feature request
Assigned:
Unassigned
Reporter:
Created:
5 May 2011 at 16:10 UTC
Updated:
8 Aug 2013 at 22:11 UTC
Jump to comment: Most recent, Most recent file
Comments
Comment #1
Anonymous (not verified) commentedHi
Have you got any numbers about the preformance gain?
I'll try to set it up this weekend and test the patch.
Regards
Tim
Comment #2
jax commentedAnother reason for using realpath is that at the moment the indexing doesn't work with the "drush core-cron" command. The host part is missing from the url.
Comment #3
Anonymous (not verified) commentedHi
I've run some test with realpath and there is no real performance gain
for indexing 34 large documents they both took around 1 minute 31 seconds, give or take 1 second.
So I don't think we should change it for the performance gain, but maybe try to solve for drush part.
What's your idea...
Tim
Comment #4
floeschie commentedI'm having the same issue with "drush cron". I applied the patch above as well the one I submitted in #1365148. Then I cleared the index and ran the "drush cron" command. I still get errors for files which are not parsed by Tika (images, plaintext).
So I extended the code a bit mor and added a new function which determines a file's realpath.
Comment #5
floeschie commentedSeems to work except that German umlauts are removed by escapeshellarg(). If a file has something like "ä ö ü" in its name, then Tika throws an exception cause it cannot find a file after the umlauts have been removed...
Comment #6
berdirStream wrappers can register if they are remote or local using the type in hook_stream_wrappers(): http://api.drupal.org/api/drupal/modules--system--system.api.php/functio...
The function added in the patch should probably check the type of the of the scheme and if it's local, use realpath() and otherwise getExternalPath(). This allows to support both private:// (which currently doesn't work if anon users don't have access) and remote wrappers like s3://.
Comment #7
wwhurley commentedAnother option is to set your $base_url in the appropriate settings.php file. That'll also fix URLs in various emails sent by the system on cron as well if you ever run into that.
Comment #8
sinasalek commentedPatch #4 works for me, however i'm agree with Berdir that it should be a bit smarter , i had a look at http://api.drupal.org/api/drupal/includes%21stream_wrappers.inc/7 but couldn't find anything that clearly indicated whether a wrapper is local or remote. If that's the case then a workaround should be used.
One solution can be check $wrapper class to see if it's a subclass of DrupalLocalStreamWrapper
Comment #9
mfb(from image.gd.inc)
Comment #10
svendecabooterAttached is a new version of patch #4 based on the feedback by Berdir & mfb
Comment #11
osopolarPatch in #10 works for me, Thanks.
Comment #12
sinasalek commentedHaven't test patch #10 but the approach looks reasonable to me too.
@osopolar if it works fine for you , you could mark it as RTBC, i'm doing on behalf of you
Comment #13
coreycondardo commentedI applied this patch and it seems to reduce the number of errors the drush command to index the site produces however I'm getting a lot of this..
INFO - unsupported/disabled operation: EIWhat is that?
--Corey
Comment #14
apanag commentedPatch worked for me. However my case was a password protected site, so every time extract was getting a 401 error code.
Using the realpath solved the problem, because no URL is required anymore.
Also some "INFO - unsupported/disabled operation: EI" messages were shown, but the vast majority of the .pdfs were indexed properly.
Thank you for the patch :-)
Comment #15
mike503 commentedpatch works great. after hours of debugging why java was dumping with absolutely no reasonable explanation... sigh
Comment #16
torpy commentedThis worked brilliantly for me. I had issues where I was using a self-signed SSL certificate (on my development) server which was causing Tika to error out.
Comment #17
izus commentedassigning to me for testing and very probabely merging in the 7.x-1.x branch
Comment #18
izus commentedHi,
just merged it in 7.x-1.x branch
Thanks all !
Comment #19
izus commented