I couldn't find anything about this in the queue but I'm sure someone already thought of this. But how about searching for users and user profiles? How extensible is this module's API? Would it be easy to extend with user searching? I haven't looked at the code yet, but I think I'll give it a try because we need this for a client's project.

Any suggestions, guidelines on how to approach this?

Comments

pwolanin’s picture

Well, at the least you'll need to add a small module to index/search users.

I talked with Robert about this a little. your approach will depend on whether you want users to be mixed in with nodes in your search, or whether you want a totally separate search.

Scott Reynolds’s picture

StatusFileSize
new2.81 KB

Here is a module that is in use on community.mylifetime.com. It provides a new type, 'user' for facet search. See: http://community.mylifetime.com/community/search/apachesolr_search/scott...

A couple points to answer anticipated questions
1.) Why write a seperate _index_alter()?
Frankly, indexing nodes vs users is very different. Different fields need could be added to the index. If they both used the same function, there would be in most implementations a if ($type == 'node') ... elseif ($type =='user') which defeats the purpose.

2.) What is this user_module_invoke?
My team and I have for awhile used this to 'build' an indexable profile. It is not a standard op for hook_user. But it has made it easy for us to roll out sites and 'build' profiles without changing any of our platform code. And yes, the module works without any module doing anything on $op == 'index'.

3.) What about changes to the solrconfig?
Yes there are changes to the solrconfig. We run a pretty modified solrconfig file though so providing a patch would be a pain. So here is the major change

<requestHandler name="drupal" class="solr.SearchHandler" default="true">
    <lst name="defaults">
     <str name="defType">dismax</str>
     <str name="echoParams">explicit</str>
     <float name="tie">0.01</float>
     <str name="qf">
        body^1.0 title^5.0 name^3.0 mail^3.0 taxonomy_names^2.0 tags_h1^5.0 tags_h2_h3^3.0 tags_h4_h5_h6^2.0 tags_inline^1.0
     </str>
     <str name="pf">
        body^2.0
     </str>
     <int name="ps">15</int>
     <str name="mm">
        2&lt;-35%
     </str>
     <str name="q.alt">*:*</str>

That is the new field 'mail' is added.

4.) Schema changes?
Yes the mail field needs to be added.

   <field name="mail" type="text" indexed="true" stored="true" termVectors="true" omitNorms="true"/>

5.) Is the user facet exposed in the facet blocks?
No it isn't. Looking for guidance on that because the facet block is generated by apachesolr_search() module which only deals with nodes. And there is no way to 'facet_alter'.

We use a custom search block on the site to do the facet filtering.

I think that address them all.

Please note that currently, we are not up to speed with the latest dev/beta versions of this module. It is mostly because we are not ready to switch to solr 1.4 yet. So the configs might need to be changed.

Scott Reynolds’s picture

Status: Active » Needs review

realized that no one could install it without our private user_profile module. That dependency doesn't exist, it did in a previous iteration, so you may edit the .info file and remove this line
dependencies[] = user_profile

Code needs a review. I know it isn't perfect, willing to make it right but asking for guidance.

JacobSingh’s picture

Hi Scott,

I haven't looked at this, but I imagine that it would need a lot of work because the APIs have changed significantly as we moved to Beta.
Can you try making a patch and rolling it in?

I'll review it anyway, because it is a killer feature, but I'll probably not get to it very soon if it is a zip file against an old code base.

Thanks!

-Jacob

pwolanin’s picture

@Scott - I think there are (at least) two possible approaches: index nodes and users in the same index, or create a totally separate schema for users and expand the apachesolr framework module to more easily handle indexing multiple types of content into different indexes.

Frankly, the choice between these two for a particular site may also depend on whether they are using nodes as user profiles, vs. the core profile module or another solution. However, to me it seems that at least thinking about the latter would be useful.

For inclusion in this module, we should probably attempt a relatively bare-bones search that might just look at core profile module. It might not even meet your needs, but would be more general.

It's not clear to me that to index users into the same index you'd even need to change the schema, since the mail field could go into a dynamic field.

Scott Reynolds’s picture

In regards to

It's not clear to me that to index users into the same index you'd even need to change the schema, since the mail field could go into a dynamic field.

is because I wanted to use it in the dismax equation add weight to that field.

My thought on node vs users is simple. If your using node profile/bio whatever else then u don't need this. And I really like it that they are in the same index. The facet searching that provides is cool and I think adds value. And through the use dynamic fields, not sure there is a technical need to build out a different schema and index.

I am willing to build it out for core profile. But I should mention that this works without any profile data. Meaning that if all you have is the basic drupal install, and apachesolr turned on its fine. It makes the users name and mail searchable. (Which is in line with Drupal core, user_search() implementation)

In regards to api change, this module is 123 lines of code with comments. It is pretty tiny. They only place without looking at the changes in Beta would be

/**
 * takes a set of documents and puts them to Solr
 *
 * @param $documents
 * array of documents to index on Solr
 */
function apachesolr_users_index_documents($documents) {
  try {
    $solr = apachesolr_get_solr();
    if (!$solr->ping()) {
      throw new Exception(t('No Solr instance avilable during indexing'));
    }
    
    // here we have solr ready to go
    $docs_sub_set = array_chunk($documents, 20);
    foreach ($docs_sub_set as $docs) {
      $solr->addDocuments($docs);
    }
    $solr->commit();
    $solr->optimize(FALSE, FALSE);
    
    // save the variable so it could be used later
  }
  catch (Exception $e) {
    watchdog('Apache Solr', $e->getMessage(), NULL, WATCHDOG_ERROR);
    throw new Exception(t('Failed to Index'));
  }
}

That is really the only place in this code it uses the Apachesolr PHP Client libraries. Everything else is self-contained. So I encourage you to take a look.

baumanis’s picture

I've put the apachesolr_users module into drupal 6 but it did not work for my 6 yet. The new table apachesolr_usrs_queue gets updated, but the function apachesolr_users_index_documents does not update the solr.

Since i am not up the learning curve of drupal module coding yet, I have done this the old bandaid way until the author of this module (Scott Reynolds?) decides to work on it (which will be lovely and elegant). Basically right now I am running a script through cron that collects my user table uid and name and then creates an xml file out of it. Then the cron uses the post.jar to update the solr with my new xml file. This way user info shows up with all the rest of the search and I don't have to use the coresearches modules with their extra tabs.

Scott Reynolds’s picture

doh severe bug in there

$users = db_query_range("SELECT uid FROM {apachesolr_users_queue} ORDER BY modified ASC", $last_checked, 0, 1000);

should be

$users = db_query_range("SELECT uid FROM {apachesolr_users_queue} ORDER BY modified ASC", 0, 1000);
robertdouglass’s picture

No matter which approach we take I'd like to be able to guarantee that you don't have to go to a new tab to search for users. I'd like to use a unified search and get both users and nodes in the results.

pwolanin’s picture

@Robert - really?

For that use case now, just use nodes-as-profiles. In the longe3r term, I think we would want to have a federated search - I think it's very poor to mix users and content in the same result set. E.g. : http://www.princeton.edu/main/tools/search/?q=stock has two panes of search results.

baumanis’s picture

My users want content and user info in one search result set. I guess it depends on each drupal website's audience what kind of search result set to provide. So, it would be nice to have a choice of separating these or putting them together.

Scott Reynolds’s picture

I think we need to separate out here indexing vs display. I do happen to think that representing users on the same level as content types is advantageous. I do think though, mixing users and nodes together can sometimes be a mess. So I think that providing a default facet of just 'nodes' is a good thing, and then can be turned off so the admin can say "mix nodes and users". So I think that in terms of the Apachesolr index, type:user should be there.

And i think this speaks to providing a single function

function apachesolr_get_params($params = array()) {
  $default_params = array(
     // all the variable gets here
    // and get the variable_get('mix_users_in?') to set the default fq types here
  );

  $final_params = $params + $default_params;

  return $final_params;
}

Thats how I would try to design it. And i believe that if $params['fq'][$key] = 'type:page' where there, it wouldn't get overridden in this function. Thereby, allow us to have a default behavior of just filtering to node types.

hope that makes sense, and i think that could work.

rapsli’s picture

any chances, this gets into the module?

deltab’s picture

Subscribing...

deltab’s picture

+1

pwolanin’s picture

If we want to go this route: the "type" for users needs to be a string that could never be a valid node type - e.g. 'user' is not acceptable, since I can create a node type using that string.

This is actually a more general problem also - for file attachments we are adding an extra Boolean, but that's not very scalable.

perhaps generally something like "object/user" and "object/file" or "object-user" or "a_s-user" or "a_s@user" or "@user"?
('a_s' == apachesolr_search)

Scott Reynolds’s picture

Version: 6.x-1.x-dev » 6.x-2.x-dev

what if it was a new facet, 'entity_type' instead of fancy strings. So "entity_type='node'" or "entity_type='user'" or "entity_type='comment'"

Of course this hits on the comment searching of 2.x, but given the recent Drupal nomenclature, I think this will translate well and would really provide more power in a clean way.

If we want to go the fancier strings route, can we leverage apachesolr_document_id() and facet.prefix?

pwolanin’s picture

Well, it would be nice to figure out some usable approach for 6.x-1.0

pwolanin’s picture

We are already roughly do this with the ID field :

function apachesolr_document_id($id, $type = 'node') {
  return apachesolr_site_hash() . "/$type/" . $id;
}

where the 'type' param there is the same as what you suggest as entity_type. i imagine, however, that doing a wildcard match on the ID field will perform much worse than if we have this as a separate string field.

Scott Reynolds’s picture

Status: Needs review » Needs work

We are already roughly do this with the ID field :

Ya that was my final comment in 17. Spent some time on it early this morning couldn't come up with a better solution then wildcarding on the ID field. I don't think that is a good idea. Adding an 'entity_type' field to the schema seems like the way through to me.

pwolanin’s picture

Version: 6.x-2.x-dev » 6.x-1.x-dev
Status: Needs work » Needs review
StatusFileSize
new3.52 KB

Here's a possible schema change patch. This would more readily enable this user search as contrib.

robertdouglass’s picture

Discussed with Peter. +1 for entity type.

anarchivist’s picture

Entity type sounds good for me. Once this code gets committed I recommend writing up similar docs that would act as implementation guidelines for non-node content.

pwolanin’s picture

I think we should just call it "entity" in the schema - no reason to make it longer.

pwolanin’s picture

Well, even if you want them separate, adding such a field make it easy to filter the search results if all the data is in one index.

pwolanin’s picture

StatusFileSize
new3.64 KB

like so.

robertdouglass’s picture

Status: Needs review » Reviewed & tested by the community

I believe this change is positive.

Scott Reynolds’s picture

Status: Reviewed & tested by the community » Needs work

Small change

<!-- enty_type is 'node', 'file', 'user', or some other Drupal object type -->

should be entity not enty_type.

Also probably need to set fq[]=entity:node on apachesolr_search queries. That way we a have a clear seperation.

robertdouglass’s picture

Ah, yes, entity:node is a good catch. This change requires a re-index so it should be clear in both CHANGELOG and release notes.

pwolanin’s picture

I think we can omit the fq by default - there is no need if you are only indexing nodes.

Scott Reynolds’s picture

Well that would mean that user search would then need to say "For all queries that arn't mine fq[]=NOT entity:user".

I really think it should be the job of the 'entity' search module to add its fq in there. Things get kindof silly from there. so if you have a comment one that its fq[] = NOT entity:user fq[] = NOT entity:comment.

I take it you don't want add the fq for performance, just not sure its worth the performance gain.

And I know I have argued previously for showing nodes and users in the same result page. But having implemented a couple sites with both user and node searching, I think I was the only one to get that paradigm and probably the only one who thought it was cool.

pwolanin’s picture

Well, mostly I don't want to force re-indexing on people who don't actually need it.

robertdouglass’s picture

If we want to avoid re-indexing then we should be focusing this effort on 6.2. I believe Scott's argument is correct, so the decision mainly comes down to whether we should be making schema changes to 6.1. I've held that we shouldn't be, but I defer to Peter for the final decision on 6.1 releases.

greggles’s picture

I think it's fine to force re-indexing. We should just have a warning message "if you enable this module you will have to reindex your site" on the download page or the project release page.

Is the code in a currently testable state?

robertdouglass’s picture

Status: Needs work » Closed (won't fix)

#641954: Update schema.xml for Solr 1.4 changes to schema version contained the entity field, and Scott says he's going to make a standalone user search module, so this issue is closed. greggles, note that the only warning for reindexing we'll have, currently, is in the release notes. People's site's won't break, however, so tracking latest devel versions is safe enough.

pwolanin’s picture

Title: User search » Index into entity field
Status: Closed (won't fix) » Needs work

Actually - user search per se is "won't fix" but my last patch needs to be applied in some form

pwolanin’s picture

Status: Needs work » Needs review
StatusFileSize
new2.16 KB

re-roll less schema changes.

pwolanin’s picture

Version: 6.x-1.x-dev » 6.x-2.x-dev
Status: Needs review » Patch (to be ported)

Committing this minimal patch to 6.x-1.x - need to come back to the language code soon.

robertdouglass’s picture

Status: Patch (to be ported) » Fixed

#38 was committed to 6.2 and 5.2.

Scott Reynolds’s picture

Version: 6.x-2.x-dev » 6.x-1.x-dev
Status: Fixed » Needs work

Again, there is no fq[] = entity:node.

See #32 for the argument as to why this is a bad idea.

pwolanin’s picture

Status: Needs work » Fixed

@Scott - I think it's a bad idea to add it by default. I want to always mix file and node results, for example.

Scott Reynolds’s picture

Then how do I solve this?

Well that would mean that user search would then need to say "For all queries that arn't mine fq[]=NOT entity:user"

How do i prevent user documents from showing up on the file + node search? Am I going ot have to do that? Seems incredibly brittle and prone to issues. Going to have to do a special case for Solr Views to not at the clause.

if ($caller != 'apachesolr_users' && $caller != 'apachesolr_views') {
  // Exclude 'user'.
}

That feels pretty dirty.

pwolanin’s picture

What's the alternative - I have to know to remove or OR together any entity fq entry to search everything in the index?

Unfortunately, neither of these are ideal situations. At least for the 2.x branch, we can think about a hook to collect all entity types and apachesolr_search could limit to an admin-selected subset for example.

Scott Reynolds’s picture

Status: Fixed » Needs work

Unfortunately, neither of these are ideal situations. At least for the 2.x branch, we can think about a hook to collect all entity types and apachesolr_search could limit to an admin-selected subset for example.

I think this supports my point. Either build out the 'files + node' properly or not at all. (side note: where is the issue for files + nodes?)

My real problem with this change is that it makes implementing the Apache Solr api harder on other module developers. If we keep this as is, then I ask that we add something to the documentation and draw a red box, a red arrow and blinking lights around it.

Hence, CDW, we need documentation for the module developer.

Frankly, surprised you are pushing for this, you were the one in who was against this in #10.

robertdouglass’s picture

Adding additional cores, schemas etc. and collecting results from multiple search indexes is not in the works. Thus, if we want to support searches on different entities they have to share one index. We can keep working on the schema and the logic to support this better, but I'm fully in favor of indexing entity type. I don't find it an unreasonable requirement for vertical search implementations to have to add entity:foo as a filter to all searches.

Scott, it's not your responsibility as the user search module author to prevent users from showing up elsewhere, it's rather the responsibility of the other implementations to prevent them from showing up. Going forward it has to be assumed that all sorts of stuff can be in the index, and you have to ask for exactly what you want, or you risk getting stuff you didn't count on.

hook_apachesolr_modify_query() {
  $query-add_filter('entity', 'user');
}
pwolanin’s picture

Status: Needs work » Fixed

@Robert - ok, so you are agreeing with Scott. My point is that we need some systematic way for modules to opt in or out of a particular query.

For 6.x-1.x the patch as committed does nothing except add more data into the index that can optionally be used. In that sense, it's a feature beyond what the module supported before. We certainly did not cause a regression. Hence "fixed".

@Scott - please open a new issue for discussing how to move forward so we can improve the API.

robertdouglass’s picture

Peter, I think I partly agree with Scott. I see the main search module as the dumping grounds of everything that gets indexed, but I think he sees this as problematic, and maybe you do too. So yeah, if we don't want users to show up in main search, then we have to specify what entities main search searches on.

In the not too distant future we'll be doing this all with Views and having the entity index is definitely a good thing there.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.