Hello,

I just installed this module, along with the file entity module. They installed fine, so I went to reindex to see if it grabbed files, and I got this error message:

SQLSTATE[42S02]: Base table or view not found: 1146 Table 'drupal_ripon.apachesolr_index_entities_file' doesn't exist

I didn't notice a readme or install file with the module, so I looked over the module page on drupal.org, and saw this step for the instructions:

Apply the solrconfig-solr3x.xml.patch in apachesolr_file module directory to solrconfig-solr3x.xml and rename it and put it in your SOLR_HOME/conf.

I'm not sure what that means... It sounds like you want me to apply a .patch file to the .xml file, but I'm not seeing any patch file included with this module. Is there just something I'm missing? Is this related to the database table not being created?

Thank you for any help you can provide.

Files: 
CommentFileSizeAuthor
#7 solrconfig-solr3x.xml_.patch702 bytesshenzhuxi

Comments

Sorry, I forget to push to git.
I just fix it.

Excellent, thank you. Now, that takes a bit to roll over to drupal.org, right?

I was just looking at this again, and with this error: SQLSTATE[42S02]: Base table or view not found: 1146 Table 'drupal_ripon.apachesolr_index_entities_file' doesn't exist, I decided to look at the website's database. Took me a while to notice it, but the table in the database for this module is titled this: apachesolr_index_enities_file. There's a 't' missing. I renamed the table, and I no longer get that error.

There's a typo for the table name in the .install file before, so after you update the module, you need uninstall and re-enable the module. Sorry for that.

Excellent. I did what you said, and it installed just fine. Thank you!

About the .patch file: I can see commits dated May 4, and the xml patch is still missing. Could you please post the patch here, attached to a comment? Or at least the code that needs to be added? I suppose it declares the /update/extract requestHandler, but I know way too little Solr to try it myself.
Thank you.

StatusFileSize
new702 bytes

I just add the patch.
Sorry I forgot to commit it.

Thank you.

I just applied the patch. After looking over the content of that patch file, it looks like that makes it so the text within a document is then indexed. Am I correct with this? If that's the case, should I re-index the site for it to take effect?

Yes you need re-index.

Hello,

After applying the patch, I now get this error as files get indexed:

Notice: unserialize(): Error at offset 0 of 7253 bytes in apachesolr_file_extract() (line 88 of /var/www/clients/client2/web2/web/sites/all/modules/apachesolr_file/apachesolr_file.module).

It gets repeated over and over.

Go to http://YOUR_SOLR_DEPLOY/update/extract/?extractOnly=true&wt=phps&extractFormat=text&stream.file=YOUR_FILE_ON_SOLR_SERVER
Check whether your file is parsed correctly.

http://wiki.apache.org/solr/ExtractingRequestHandler may be useful.

Well, it turns out that I made a mistake with one of your installation steps. I had the contrib and dist directories in my root apache-solr directory. Once I moved them to apache-solr/example/solr/ and re-indexed everything, now it seems to be working great.

Now that I have that working, it has led to another question. Search results come up perfectly for me when I'm logged in as the administrative user. But, anonymous users get zero results when searching files.

Looking at permissions, I have all users set to view all files under File entity. I also have all users set to use search and advanced search.

Is there another area for apachesolr_file to allow anonymous users to search files?

Thanks again for all your help.

After enabled "Use search" for roles, it works fine. Just tested.

Odd... I made sure all users have 'use search', tried rebuilding permissions afterwards, and anonymous users still can't search files. Searching the rest of the site works fine, it's just files they can't search.

So, I did some further testing. Anonymous users can view files, such as here:
http://beta.riponlibrary.org/file/957

So, it seems to just be that anonymous users cannot search for those files. Any ideas what I may have done wrong? Are there permission settings specific to apachesolr_file that I'm just not seeing?

Well, I figured out the source of the problem. If I disable and uninstall the Apache Solr Access module, anonymous users can then search the site's files. Not sure what other ramifications I'll run into having that portion of the Solr module disabled, but the issue seems to stem with an incompatibility with that particular module.

Status:Active» Fixed

Maybe after you reinstalled the priority of modules were changed.

there is another potential error which isn't caught : if the extract handler has not been setup (eg, patch not applied), the 404 return from the "extractOnly" request is not handled so the content is not indexed.
This should be handled by the module.

apachesolr_file_extract should return FALSE if the filesize() is greater than the multipartUploadLimitInKB otherwise it may exceed the Apache memory limit while there's no chance to get the file extracted.

Could you elaborate on why indexation has to be done in 2 steps, using extractOnly first ?

Extraction applies to local files (cf file_get_contents()) thus relying only on the ExtractingRequestHandler would allow the use of exec(curl) what would be far less memory hungry.

http://drupal.org/project/apachesolr_media 7.x-2.x apply the 1 step way which required more modification to apachesolr module. It's more difficult for users to deploy.
The first release of apachesolr_file module keep the minimal codes by reusing apachesolr api, so it can't apply ExtractingRequestHandler in one step.

That's quite disappointing (even if you did a great work on this):

* apachesolr_media uses (used ?) apachesolr_get_solr->addFile() which AFAICT does not exists (anymore ?)
* if it's not possible, then the current apachesolr API design is wrong
* the current way to index content is clearly suboptimal for a "successor" module.

Status:Fixed» Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

I see the .patch file and have run a command to make the changes but the word back is 'no changes'
The command I ran was git apply . I have both the patch and the original file in the same directory.