User search
dixon_ - December 18, 2008 - 15:07
| Project: | Apache Solr Search Integration |
| Version: | 6.x-1.x-dev |
| Component: | Code |
| Category: | feature request |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | needs review |
Description
I couldn't find anything about this in the queue but I'm sure someone already thought of this. But how about searching for users and user profiles? How extensible is this module's API? Would it be easy to extend with user searching? I haven't looked at the code yet, but I think I'll give it a try because we need this for a client's project.
Any suggestions, guidelines on how to approach this?

#1
Well, at the least you'll need to add a small module to index/search users.
I talked with Robert about this a little. your approach will depend on whether you want users to be mixed in with nodes in your search, or whether you want a totally separate search.
#2
Here is a module that is in use on community.mylifetime.com. It provides a new type, 'user' for facet search. See: http://community.mylifetime.com/community/search/apachesolr_search/scott...
A couple points to answer anticipated questions
1.) Why write a seperate _index_alter()?
Frankly, indexing nodes vs users is very different. Different fields need could be added to the index. If they both used the same function, there would be in most implementations a
<?phpif ($type == 'node') ... elseif ($type =='user')
?>
2.) What is this user_module_invoke?
My team and I have for awhile used this to 'build' an indexable profile. It is not a standard op for hook_user. But it has made it easy for us to roll out sites and 'build' profiles without changing any of our platform code. And yes, the module works without any module doing anything on $op == 'index'.
3.) What about changes to the solrconfig?
Yes there are changes to the solrconfig. We run a pretty modified solrconfig file though so providing a patch would be a pain. So here is the major change
<requestHandler name="drupal" class="solr.SearchHandler" default="true"><lst name="defaults">
<str name="defType">dismax</str>
<str name="echoParams">explicit</str>
<float name="tie">0.01</float>
<str name="qf">
body^1.0 title^5.0 name^3.0 mail^3.0 taxonomy_names^2.0 tags_h1^5.0 tags_h2_h3^3.0 tags_h4_h5_h6^2.0 tags_inline^1.0
</str>
<str name="pf">
body^2.0
</str>
<int name="ps">15</int>
<str name="mm">
2<-35%
</str>
<str name="q.alt">*:*</str>
That is the new field 'mail' is added.
4.) Schema changes?
Yes the mail field needs to be added.
<field name="mail" type="text" indexed="true" stored="true" termVectors="true" omitNorms="true"/>5.) Is the user facet exposed in the facet blocks?
No it isn't. Looking for guidance on that because the facet block is generated by apachesolr_search() module which only deals with nodes. And there is no way to 'facet_alter'.
We use a custom search block on the site to do the facet filtering.
I think that address them all.
Please note that currently, we are not up to speed with the latest dev/beta versions of this module. It is mostly because we are not ready to switch to solr 1.4 yet. So the configs might need to be changed.
#3
realized that no one could install it without our private user_profile module. That dependency doesn't exist, it did in a previous iteration, so you may edit the .info file and remove this line
dependencies[] = user_profileCode needs a review. I know it isn't perfect, willing to make it right but asking for guidance.
#4
Hi Scott,
I haven't looked at this, but I imagine that it would need a lot of work because the APIs have changed significantly as we moved to Beta.
Can you try making a patch and rolling it in?
I'll review it anyway, because it is a killer feature, but I'll probably not get to it very soon if it is a zip file against an old code base.
Thanks!
-Jacob
#5
@Scott - I think there are (at least) two possible approaches: index nodes and users in the same index, or create a totally separate schema for users and expand the apachesolr framework module to more easily handle indexing multiple types of content into different indexes.
Frankly, the choice between these two for a particular site may also depend on whether they are using nodes as user profiles, vs. the core profile module or another solution. However, to me it seems that at least thinking about the latter would be useful.
For inclusion in this module, we should probably attempt a relatively bare-bones search that might just look at core profile module. It might not even meet your needs, but would be more general.
It's not clear to me that to index users into the same index you'd even need to change the schema, since the mail field could go into a dynamic field.
#6
In regards to
is because I wanted to use it in the dismax equation add weight to that field.
My thought on node vs users is simple. If your using node profile/bio whatever else then u don't need this. And I really like it that they are in the same index. The facet searching that provides is cool and I think adds value. And through the use dynamic fields, not sure there is a technical need to build out a different schema and index.
I am willing to build it out for core profile. But I should mention that this works without any profile data. Meaning that if all you have is the basic drupal install, and apachesolr turned on its fine. It makes the users name and mail searchable. (Which is in line with Drupal core, user_search() implementation)
In regards to api change, this module is 123 lines of code with comments. It is pretty tiny. They only place without looking at the changes in Beta would be
<?php
/**
* takes a set of documents and puts them to Solr
*
* @param $documents
* array of documents to index on Solr
*/
function apachesolr_users_index_documents($documents) {
try {
$solr = apachesolr_get_solr();
if (!$solr->ping()) {
throw new Exception(t('No Solr instance avilable during indexing'));
}
// here we have solr ready to go
$docs_sub_set = array_chunk($documents, 20);
foreach ($docs_sub_set as $docs) {
$solr->addDocuments($docs);
}
$solr->commit();
$solr->optimize(FALSE, FALSE);
// save the variable so it could be used later
}
catch (Exception $e) {
watchdog('Apache Solr', $e->getMessage(), NULL, WATCHDOG_ERROR);
throw new Exception(t('Failed to Index'));
}
}
?>
That is really the only place in this code it uses the Apachesolr PHP Client libraries. Everything else is self-contained. So I encourage you to take a look.
#7
I've put the apachesolr_users module into drupal 6 but it did not work for my 6 yet. The new table apachesolr_usrs_queue gets updated, but the function apachesolr_users_index_documents does not update the solr.
Since i am not up the learning curve of drupal module coding yet, I have done this the old bandaid way until the author of this module (Scott Reynolds?) decides to work on it (which will be lovely and elegant). Basically right now I am running a script through cron that collects my user table uid and name and then creates an xml file out of it. Then the cron uses the post.jar to update the solr with my new xml file. This way user info shows up with all the rest of the search and I don't have to use the coresearches modules with their extra tabs.
#8
doh severe bug in there
<?php$users = db_query_range("SELECT uid FROM {apachesolr_users_queue} ORDER BY modified ASC", $last_checked, 0, 1000);
?>
should be
<?php$users = db_query_range("SELECT uid FROM {apachesolr_users_queue} ORDER BY modified ASC", 0, 1000);
?>
#9
No matter which approach we take I'd like to be able to guarantee that you don't have to go to a new tab to search for users. I'd like to use a unified search and get both users and nodes in the results.
#10
@Robert - really?
For that use case now, just use nodes-as-profiles. In the longe3r term, I think we would want to have a federated search - I think it's very poor to mix users and content in the same result set. E.g. : http://www.princeton.edu/main/tools/search/?q=stock has two panes of search results.
#11
My users want content and user info in one search result set. I guess it depends on each drupal website's audience what kind of search result set to provide. So, it would be nice to have a choice of separating these or putting them together.
#12
I think we need to separate out here indexing vs display. I do happen to think that representing users on the same level as content types is advantageous. I do think though, mixing users and nodes together can sometimes be a mess. So I think that providing a default facet of just 'nodes' is a good thing, and then can be turned off so the admin can say "mix nodes and users". So I think that in terms of the Apachesolr index, type:user should be there.
And i think this speaks to providing a single function
<?php
function apachesolr_get_params($params = array()) {
$default_params = array(
// all the variable gets here
// and get the variable_get('mix_users_in?') to set the default fq types here
);
$final_params = $params + $default_params;
return $final_params;
}
?>
Thats how I would try to design it. And i believe that if $params['fq'][$key] = 'type:page' where there, it wouldn't get overridden in this function. Thereby, allow us to have a default behavior of just filtering to node types.
hope that makes sense, and i think that could work.
#13
any chances, this gets into the module?
#14
Subscribing...
#15
+1
#16
If we want to go this route: the "type" for users needs to be a string that could never be a valid node type - e.g. 'user' is not acceptable, since I can create a node type using that string.
This is actually a more general problem also - for file attachments we are adding an extra Boolean, but that's not very scalable.
perhaps generally something like "object/user" and "object/file" or "object-user" or "a_s-user" or "a_s@user" or "@user"?
('a_s' == apachesolr_search)
#17
what if it was a new facet, 'entity_type' instead of fancy strings. So "entity_type='node'" or "entity_type='user'" or "entity_type='comment'"
Of course this hits on the comment searching of 2.x, but given the recent Drupal nomenclature, I think this will translate well and would really provide more power in a clean way.
If we want to go the fancier strings route, can we leverage apachesolr_document_id() and facet.prefix?
#18
Well, it would be nice to figure out some usable approach for 6.x-1.0
#19
We are already roughly do this with the ID field :
<?phpfunction apachesolr_document_id($id, $type = 'node') {
return apachesolr_site_hash() . "/$type/" . $id;
}
?>
where the 'type' param there is the same as what you suggest as entity_type. i imagine, however, that doing a wildcard match on the ID field will perform much worse than if we have this as a separate string field.
#20
Ya that was my final comment in 17. Spent some time on it early this morning couldn't come up with a better solution then wildcarding on the ID field. I don't think that is a good idea. Adding an 'entity_type' field to the schema seems like the way through to me.
#21
Here's a possible schema change patch. This would more readily enable this user search as contrib.
#22
Discussed with Peter. +1 for entity type.
#23
Entity type sounds good for me. Once this code gets committed I recommend writing up similar docs that would act as implementation guidelines for non-node content.
#24
I think we should just call it "entity" in the schema - no reason to make it longer.
#26
Well, even if you want them separate, adding such a field make it easy to filter the search results if all the data is in one index.