Closed (fixed)
Project:
Apache Solr File
Version:
7.x-1.x-dev
Component:
Code
Priority:
Normal
Category:
Support request
Assigned:
Unassigned
Reporter:
Created:
3 May 2012 at 16:38 UTC
Updated:
8 Nov 2012 at 10:40 UTC
Hello,
I was just curious... How do you specify which directory of files to index? Does it just take the main files directory for the site? Is there a way to specify a specific directory? I have the module setup and error free, but it's not actually indexing any files, so I was curious if I was just missing a step somewhere.
Thanks for your help!
Comments
Comment #1
shenzhuxi commentedRight now there is not specific setting for directory. You can only set file bundles to index at DRUPAL_HOME/admin/config/search/apachesolr.
If you checkout the latest version and connect Solr and run index job correctly, you find a search page is provided at DRUPAL_HOME/search/file.
Comment #2
georgedamonkey commentedHow does it know which directory to look in for files? What's the default location?
I am seeing the DRUPAL_HOME/search/file page. But, when I run an index, it's not seeing any files.
Comment #3
shenzhuxi commentedYou can check your files here DRUPAL_HOME/admin/content/file
You can check whether your files were indexed here SOLR_HOST/solr/select/?q.alt=entity_type:file&indent=on
Comment #4
georgedamonkey commentedHmm... I looked at DRUPAL_HOME/admin/content/file and there's not much there. I've got well over a thousand pdf's/png's/doc's in subdirectories within the 'files' directory. What criteria is used as to whether or not something is indexed?
Thank you so much for taking the time to help me with this. I really appreciate that.
Comment #5
shenzhuxi commentedDo you upload your files by FTP?
You need to upload files through Drupal's interface.
Comment #6
georgedamonkey commentedAh, ok. That makes sense now. I went and deleted all files I had in the files directory. I then went into drupal and have been uploading files in there instead. As I do that, I do notice at DRUPAL_HOME/admin/content/file, all of the files I'm uploading. Now, if I go and try to index new content, it doesn't see anything new. I have Application, Image, and Text set to index. Mainly, I want to make sure PDF's index. Is there anything else I need to do?
If I go to SOLR_HOME/solr/select/?q.alt=entity_type:file&indent=on, I get this:
Comment #7
shenzhuxi commentedhttp://wiki.apache.org/solr/UpdateXmlMessages#A.22commit.22_and_.22optim...
You need to run commit to Solr before you search.
There is a commit delay setting in Solr configure.
http://drupalcode.org/project/apachesolr.git/blob/refs/heads/7.x-1.x:/so...
Comment #8
georgedamonkey commentedOk, I think I've got it. Turns out, I was running Solr version 1.4. I upgraded to 3.6, and it seems to be indexing files just fine now.
Which leads me to two questions... Does this module/solr search just the filename, or does it search text within files as well? If you look at my search page results:
http://beta.riponlibrary.org/search/file/robert
It shows file names, but no text.
Also, is there a way you know of that when you view a file like this:
http://beta.riponlibrary.org/file/555
It will actually show you the file, rather than just the title?
Thanks again for your help.
Comment #9
shenzhuxi commentedSupported file formats: http://tika.apache.org/1.1/formats.html
You can set display on DRUPAL_HOME/admin/structure/file-types (file_entity 7.x-2.x)
If you files are mainly PDF, consider my another module http://drupal.org/project/pdf. But you need to hack file_entity a little bit http://drupal.org/node/1540668
Comment #10
georgedamonkey commentedExcellent, thank you. I successfully got it so pdf's are displayed, rather than just a link. Works great!
But... Now that I've got the pdf module setup, I can no longer search files. The search page is there, but no results show for searches. Also, I add new files, and those don't get indexed. I'm not sure how they're related, but is there something else I need to do?
Thank you.
Comment #11
shenzhuxi commentedCan you search before?
Do you run the cron job?
Maybe check the apachesolr index status.
Comment #12
georgedamonkey commentedI just updated the pdf module to the latest beta release, and that seems to have fixed it.
One last question. Is there a way to have it search the content of the documents, rather than just the titles?
Thank you again for your help!
Comment #13
shenzhuxi commented"I upgraded to 3.6, and it seems to be indexing files just fine now." what do you mean file?
Did you patched your solrconfig.xml?
Comment #14
shenzhuxi commentedI just add the patch.
Sorry I forgot to commit it.
Comment #15
shenzhuxi commented