Give modules the opportunity to make several documents out of a node
Damien Tournoud - August 3, 2009 - 11:29
| Project: | Apache Solr Search Integration |
| Version: | 6.x-2.x-dev |
| Component: | Code |
| Category: | feature request |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | closed |
Description
The current hook_apachesolr_update_index() doesn't give the opportunity for modules to generate several documents out of the same node. Could we add this feature, pretty please?

#1
Two possible solutions:
#2
I would prefer allowing developers to specify a custom callback to generate documents from nodes.
#3
Seconding pwolanin's approach.
#4
The issue with this approach is that only one module can take over the apachesolr_search_custom_indexing_callback variable. There is no way for several modules to alter documents (or make several documents out of one, as in my use case).
#5
per discussion earlier w/ Damien - any module could also maintain its own queue of nodes to be indexed (like apachesolr_attachements), and any module can implement hook_apachesolr_node_exclude(), so especially with this addition, their ought to be sufficient flexibity for most any application.
#6
Hi
I am especially interested in the implementation of this funtionality for apachesolr_attachements due to the fact that right know, if there are many documents attached to a single node (as it happens in my system), very often I get cron timeouts as the system tries to index all of them.
I would appreciate it very much if someone would take steps in this direction.
Regards
#7
Let's keep 6.1 more or less frozen.
#8
@robert - adding this to 6.x1.x would be ok I think, since it doesn't alter BC.
@dgarciad - this is the wrong issue queue then and not directly related to your problem. Look at the code in apachesolr_attachments and post an issue there. It does try to limit the total time taken, but if you have huge numbers of attachments per node there is no easy answer.
#9
@pwolanin - ok - but let's fix it in 6.2 and backport, then.
#10
sure - go for it.
#11
@dgarciad: You might also be interested in this issue: #456420: Add Batch API support for rebuilding indexes
#12
wrt #2 from pwolanin: I think I'd be more in support of calling a hook. The variable seems clunky. Is there any reason not to do an invoke all on the rows?
<?php- apachesolr_index_nodes($rows, 'apachesolr_search');
+ module_invoke_all('apachesolr_index', $rows);
?>
#13
Comments from pwolanin from chat:
I have to add that node_load all over again isn't so horrible ... but better if it can be avoided.
#14
As a first step I'm addressing Peter's concern at running node_load multiple times. Especially since we invalidate the static cache when we do it, this would present a huge performance problem and slow down indexing even more. So I'm centralizing node_load at a higher level and passing $node instead of $nid to all the (currently one) functions that want to build documents. This patch does that, plus it removes apachesolr_add_node_document which doesn't seem needed, and lets the document building functions return the $documents directly.
Since I'm in the mood to be a cowboy I'm committing all this as I go. Feel free to comment and tell me I'm full of it. Willing to roll things back if needed.
#15
Here's a decent solution. There is now a hook_apachesolr_document_handlers that collects a list of function names. These functions are all capable of turning an entity into a document. The $type of the entity and the $namespace of the module triggering indexing are all passed along.
Committing this so I can move straight on to indexing comments. Review still welcome, of course.
#16
Ok. If Peter is interested in backporting, go for it. Otherwise, please close.
#17
If we are changing the indexing API, I'm a little reluctant to backport.
#18
Settled.
#19
#20
Automatically closed -- issue fixed for 2 weeks with no activity.