Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
If you click the "Delete the attachments from the index" button and go reindex all queued content, I've noticed that some of your files will often be missing from the search index still, even though Apache Solr believes all content has been indexed.
I believe I've tracked this down to an issue where the last index position isn't being correctly set. Patch coming up.
Comment | File | Size | Author |
---|---|---|---|
#1 | apachesolr-attachments-fix-deleting-index-2014149-1.patch | 604 bytes | David_Rothstein |
Comments
Comment #1
David_Rothstein CreditAttribution: David_Rothstein commentedHere is the patch. I'm not 100% sure using the default environment ID is correct here, but in practice I think it doesn't matter (at least for the default reindexing callback used by this module).
Comment #2
Nick_vhI'm not sure if this is the right way of solving this. However, we could do as I suggest in the patch. Someone needs to test this
Comment #3
drasgardian CreditAttribution: drasgardian commentedI don't think that _apachesolr_attachments_get_all_files() is actually getting all the files, it is only getting one file per entity. I've got 4 different translations of a file attached to one node but _apachesolr_attachments_get_all_files() is only returning one of those. EFQ might not be the best approach there.
Comment #4
wsantell CreditAttribution: wsantell as a volunteer commentedRegarding Nick_vh's patch, you need to add the following line before the 1 line of code:
module_load_include('inc', 'apachesolr', 'apachesolr.index');
Without it, confirming the "Clear the attachment text extraction cache" button will give you this message:
"Fatal error: Call to undefined function apachesolr_index_mark_for_reindex() in ... apachesolr_attachments\apachesolr_attachments.module on line 297"
This is because apachesolr.index.inc isn't loaded globally. I'm still testing to see if this does anything to address the primary issue.
Comment #5
dmsmidtI don't know if #2 fixes anything, but I do know it kills performance.
Clearing the index is not doable anymore with a sane max_execution_time.
Comment #6
milesw CreditAttribution: milesw commentedThe original issue and the patches here were related to reindexing problems that appear to be fixed in latest dev.
I opened a new issue with a patch for the problem mentioned in #3:
#2606214: Not all files get indexed for multilingual file field
Closing this one.
Comment #7
David_Rothstein CreditAttribution: David_Rothstein as a volunteer commentedAre you sure this is actually fixed in the latest dev? The code looks very similar to me, and either patch above (#1 or #2) still applies...
Unfortunately I don't have a good way to test this anymore, but from what I remember of the issue and from what I wrote above ("I've tracked this down to an issue where the last index position isn't being correctly set") it's not obvious what would have fixed it in the interim.
Comment #8
milesw CreditAttribution: milesw commentedAh, you're right, sorry about that. Think I mixed up this issue with #1563478: Deleted attachments not being removed from index. Thanks for catching.
Comment #9
milesw CreditAttribution: milesw commentedPatch #1 seems to resolve the problem, though apachesolr_attachments_solr_reindex() ends up getting called twice when using the deletion form.
Patch #2 seems to cause recursion as it's triggering a reindex inside the reindex callback, which explains comment #5.