The node module determines which nodes need processing in update_index using a pretty complex query.
// Store the maximum possible comments per thread (used for ranking by reply count)
variable_set('node_cron_comments_scale', 1.0 / max(1, db_result(db_query('SELECT MAX(comment_count) FROM {node_comment_statistics}'))));
variable_set('node_cron_views_scale', 1.0 / max(1, db_result(db_query('SELECT MAX(totalcount) FROM {node_counter}'))));
$result = db_query_range("SELECT n.nid FROM {node} n LEFT JOIN {search_dataset} d ON d.type = 'node' AND d.sid = n.nid WHERE d.sid IS NULL OR d.reindex <> 0 ORDER BY d.reindex ASC, n.nid ASC", 0, $limit);
This query, to the best of our knowledge, finds all of the nodes that have been added, updated or commented upon since the last cron run, and feeds a set number of them (oldest first) into the search_index to be indexed. So far so good.
This little clause in the query, however, locks this code into only working for search data that is of type 'node':
d.type = 'node'
In other words, even if other modules also index node content (perhaps handling the content differently), they can't use this function to find out what nodes need indexing because this clause is hardcoded.
The feature being requested is to refactor the query that finds what nodes to index into a search.module API function that lets any module ask for nodes to (re)index. The function should take an argument that tells it which type of search data to look for:
// Returns an array of nodes.
search_get_nodes_to_index($type)
Perhaps a second parameter is needed to set the limit:
search_get_nodes_to_index($type, $limit = 100)
Comments
Comment #1
robertdouglass commentedDuplicate. #282192: Pull custom search indexing into backend