If you click the "Delete the attachments from the index" button and go reindex all queued content, I've noticed that some of your files will often be missing from the search index still, even though Apache Solr believes all content has been indexed.

I believe I've tracked this down to an issue where the last index position isn't being correctly set. Patch coming up.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

David_Rothstein’s picture

Status: Active » Needs review
FileSize
604 bytes

Here is the patch. I'm not 100% sure using the default environment ID is correct here, but in practice I think it doesn't matter (at least for the default reindexing callback used by this module).

Nick_vh’s picture

Issue summary: View changes
FileSize
518 bytes

I'm not sure if this is the right way of solving this. However, we could do as I suggest in the patch. Someone needs to test this

drasgardian’s picture

Status: Needs review » Needs work

I don't think that _apachesolr_attachments_get_all_files() is actually getting all the files, it is only getting one file per entity. I've got 4 different translations of a file attached to one node but _apachesolr_attachments_get_all_files() is only returning one of those. EFQ might not be the best approach there.

wsantell’s picture

Regarding Nick_vh's patch, you need to add the following line before the 1 line of code:
module_load_include('inc', 'apachesolr', 'apachesolr.index');

Without it, confirming the "Clear the attachment text extraction cache" button will give you this message:
"Fatal error: Call to undefined function apachesolr_index_mark_for_reindex() in ... apachesolr_attachments\apachesolr_attachments.module on line 297"

This is because apachesolr.index.inc isn't loaded globally. I'm still testing to see if this does anything to address the primary issue.

dmsmidt’s picture

I don't know if #2 fixes anything, but I do know it kills performance.
Clearing the index is not doable anymore with a sane max_execution_time.

milesw’s picture

Status: Needs work » Closed (fixed)
Related issues: +#2606214: Not all files get indexed for multilingual file field

The original issue and the patches here were related to reindexing problems that appear to be fixed in latest dev.

I opened a new issue with a patch for the problem mentioned in #3:

#2606214: Not all files get indexed for multilingual file field

Closing this one.

David_Rothstein’s picture

Status: Closed (fixed) » Needs review

Are you sure this is actually fixed in the latest dev? The code looks very similar to me, and either patch above (#1 or #2) still applies...

Unfortunately I don't have a good way to test this anymore, but from what I remember of the issue and from what I wrote above ("I've tracked this down to an issue where the last index position isn't being correctly set") it's not obvious what would have fixed it in the interim.

milesw’s picture

Ah, you're right, sorry about that. Think I mixed up this issue with #1563478: Deleted attachments not being removed from index. Thanks for catching.

milesw’s picture

Patch #1 seems to resolve the problem, though apachesolr_attachments_solr_reindex() ends up getting called twice when using the deletion form.

Patch #2 seems to cause recursion as it's triggering a reindex inside the reindex callback, which explains comment #5.