.. even those that are set not to be indexed.
A site we're building has around 10000 nodes, and almost 6000 of them are profile nodes, that we don't index, set in solr settings. These nodes are still added to the apachesolr_search_node table, which makes solr think that it has 10000 nodes to index, not 4000. As it turns out, as I just had to rebuild the index, Solr goes through all 10000 nodes, but the first 6000 doesn't add anything to the index, but still takes up cron time from those nodes that should be indexed.
Is this intended behaviour?
Comments
Comment #1
pwolanin commentedProbably not ideal - we need to add some extra logic to the admin form so we can track when there is a state change I think.
Comment #2
blackdog commentedI've added some checks to apachesolr.module to make sure that nodes that are not intended to be indexed isn't added to the table.
Comment #3
pwolanin commentedif you change the settings, they formerly excluded nodes will never get in the index.
It might be better to either add another DB column or use the status column.
Comment #4
blackdog commented@pwolanin - isn't this snippet in
function apachesolr_search_type_boost_form_submitdoing exactly that - adding nodes if settings are changed:Comment #5
pwolanin commentedUPDATEComment #6
blackdog commentedAhh.
I'm not really following why a new DB column would be needed for this. Wouldn't it work to just rewrite the above submit function to INSERT instead?
Comment #7
pwolanin commentedIn that case, you need to delete them if the setting is changed or insert them. Either approach could work.
Comment #8
blackdog commentedUpdated patch adds logic to the submit function to add and delete nodes from apachesolr_search_node when settings change.
Comment #9
blackdog commentedAny reviewers on this?
Comment #10
pwolanin commented'AS' -> 'as'
use
!empty()rather than!= '0'which is too implementation specific.You night also iterate through all existing types and check if they are in the excluded list.
Comment #11
pwolanin commentedThis has code-style problems and shoudl use placeholders ad pass the arguments to the query rather than directly imploding into the query.
Comment #12
pwolanin commentedLooking at this - it might just be simpler to join to the node table and add an extra WHERE clause, rather than doing all this inserting/deleting.
Comment #13
pwolanin commentedComment #14
jody lynnLooks useful. Read through the code and can test tomorrow.
function _apachesolr_exclude_types needs a code comment
Comment #15
blackdog commentedSorry I haven't reviewed this yet, will get to it asap!
Comment #16
blackdog commentedPatch works as intended.
Applied patch, set node type Page as excluded. At next Solr commit, nodes of type Page are deleted, and no new Pages are added. Unsetting Page as excluded adds the nodes back to the index.
Awaiting Jody Lynns review to RTBC this.
Thanks for looking into this pwolanin!
Comment #17
pwolanin commentedSlight enhancement - add a daily check that we did not fail to delete any excluded nodes. Though I'm a little unsure about the namespaces thing - maybe this check should be in apacehsolr_search for jsut its excluded types?
Comment #18
JacobSingh commentedLooks good.
Comment #19
pwolanin commentedThinking more about this - the code to catch any failed delete shoudl not be in the framework. Our one mainn use case for namespaces has been node attachments. So, for example, if I exclude attachments on 'story' nodes that does NOT mean that all 'story' nodes should be deleted from the index.
Comment #20
jody lynnWe tested the patch in #13:
It worked but the apache solr index table never seems to get cleaned out, so it still has info from all the nodes types that have been exlcuded, but were previously indexed. (even after index deletion)
Comment #21
pwolanin commented@Jody - that's by design. The latest patch leaves all the nodes in the table, but just excludes them from indexing via the JOIN sql.
Comment #22
pwolanin commentedbetter title
Comment #23
pwolanin commentedactually I think the patch in #13 might be good enough - perhaps we could remove the check
empty($old_excluded_types[$type])here so that admins can re-submit to send delete queries if they fail.Comment #24
pwolanin commentedok, with updated README too.
Comment #25
pwolanin commentedcommitted to 6.x
Comment #26
pwolanin commented