Hello,

I was just curious... How do you specify which directory of files to index? Does it just take the main files directory for the site? Is there a way to specify a specific directory? I have the module setup and error free, but it's not actually indexing any files, so I was curious if I was just missing a step somewhere.

Thanks for your help!

Comments

shenzhuxi’s picture

Right now there is not specific setting for directory. You can only set file bundles to index at DRUPAL_HOME/admin/config/search/apachesolr.
If you checkout the latest version and connect Solr and run index job correctly, you find a search page is provided at DRUPAL_HOME/search/file.

georgedamonkey’s picture

How does it know which directory to look in for files? What's the default location?

I am seeing the DRUPAL_HOME/search/file page. But, when I run an index, it's not seeing any files.

shenzhuxi’s picture

You can check your files here DRUPAL_HOME/admin/content/file
You can check whether your files were indexed here SOLR_HOST/solr/select/?q.alt=entity_type:file&indent=on

georgedamonkey’s picture

Hmm... I looked at DRUPAL_HOME/admin/content/file and there's not much there. I've got well over a thousand pdf's/png's/doc's in subdirectories within the 'files' directory. What criteria is used as to whether or not something is indexed?

Thank you so much for taking the time to help me with this. I really appreciate that.

shenzhuxi’s picture

Do you upload your files by FTP?
You need to upload files through Drupal's interface.

georgedamonkey’s picture

Ah, ok. That makes sense now. I went and deleted all files I had in the files directory. I then went into drupal and have been uploading files in there instead. As I do that, I do notice at DRUPAL_HOME/admin/content/file, all of the files I'm uploading. Now, if I go and try to index new content, it doesn't see anything new. I have Application, Image, and Text set to index. Mainly, I want to make sure PDF's index. Is there anything else I need to do?

If I go to SOLR_HOME/solr/select/?q.alt=entity_type:file&indent=on, I get this:

This XML file does not appear to have any style information associated with it. The document tree is shown below.
<response>
<result name="response" numFound="0" start="0"/>
</response>
shenzhuxi’s picture

http://wiki.apache.org/solr/UpdateXmlMessages#A.22commit.22_and_.22optim...
You need to run commit to Solr before you search.
There is a commit delay setting in Solr configure.
http://drupalcode.org/project/apachesolr.git/blob/refs/heads/7.x-1.x:/so...

georgedamonkey’s picture

Ok, I think I've got it. Turns out, I was running Solr version 1.4. I upgraded to 3.6, and it seems to be indexing files just fine now.

Which leads me to two questions... Does this module/solr search just the filename, or does it search text within files as well? If you look at my search page results:
http://beta.riponlibrary.org/search/file/robert

It shows file names, but no text.

Also, is there a way you know of that when you view a file like this:
http://beta.riponlibrary.org/file/555

It will actually show you the file, rather than just the title?

Thanks again for your help.

shenzhuxi’s picture

Supported file formats: http://tika.apache.org/1.1/formats.html

You can set display on DRUPAL_HOME/admin/structure/file-types (file_entity 7.x-2.x)

If you files are mainly PDF, consider my another module http://drupal.org/project/pdf. But you need to hack file_entity a little bit http://drupal.org/node/1540668

georgedamonkey’s picture

Excellent, thank you. I successfully got it so pdf's are displayed, rather than just a link. Works great!

But... Now that I've got the pdf module setup, I can no longer search files. The search page is there, but no results show for searches. Also, I add new files, and those don't get indexed. I'm not sure how they're related, but is there something else I need to do?

Thank you.

shenzhuxi’s picture

Can you search before?
Do you run the cron job?
Maybe check the apachesolr index status.

georgedamonkey’s picture

I just updated the pdf module to the latest beta release, and that seems to have fixed it.

One last question. Is there a way to have it search the content of the documents, rather than just the titles?

Thank you again for your help!

shenzhuxi’s picture

"I upgraded to 3.6, and it seems to be indexing files just fine now." what do you mean file?
Did you patched your solrconfig.xml?

shenzhuxi’s picture

I just add the patch.
Sorry I forgot to commit it.

shenzhuxi’s picture

Status: Active » Fixed

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.