Closed (fixed)
Project:
Apache Solr Search
Version:
6.x-1.x-dev
Component:
Code
Priority:
Normal
Category:
Feature request
Assigned:
Unassigned
Reporter:
Created:
18 Dec 2008 at 15:07 UTC
Updated:
2 Jan 2010 at 10:40 UTC
Jump to comment: Most recent file
Comments
Comment #1
pwolanin commentedWell, at the least you'll need to add a small module to index/search users.
I talked with Robert about this a little. your approach will depend on whether you want users to be mixed in with nodes in your search, or whether you want a totally separate search.
Comment #2
Scott Reynolds commentedHere is a module that is in use on community.mylifetime.com. It provides a new type, 'user' for facet search. See: http://community.mylifetime.com/community/search/apachesolr_search/scott...
A couple points to answer anticipated questions
1.) Why write a seperate _index_alter()?
Frankly, indexing nodes vs users is very different. Different fields need could be added to the index. If they both used the same function, there would be in most implementations a
if ($type == 'node') ... elseif ($type =='user')which defeats the purpose.2.) What is this user_module_invoke?
My team and I have for awhile used this to 'build' an indexable profile. It is not a standard op for hook_user. But it has made it easy for us to roll out sites and 'build' profiles without changing any of our platform code. And yes, the module works without any module doing anything on $op == 'index'.
3.) What about changes to the solrconfig?
Yes there are changes to the solrconfig. We run a pretty modified solrconfig file though so providing a patch would be a pain. So here is the major change
That is the new field 'mail' is added.
4.) Schema changes?
Yes the mail field needs to be added.
5.) Is the user facet exposed in the facet blocks?
No it isn't. Looking for guidance on that because the facet block is generated by apachesolr_search() module which only deals with nodes. And there is no way to 'facet_alter'.
We use a custom search block on the site to do the facet filtering.
I think that address them all.
Please note that currently, we are not up to speed with the latest dev/beta versions of this module. It is mostly because we are not ready to switch to solr 1.4 yet. So the configs might need to be changed.
Comment #3
Scott Reynolds commentedrealized that no one could install it without our private user_profile module. That dependency doesn't exist, it did in a previous iteration, so you may edit the .info file and remove this line
dependencies[] = user_profileCode needs a review. I know it isn't perfect, willing to make it right but asking for guidance.
Comment #4
JacobSingh commentedHi Scott,
I haven't looked at this, but I imagine that it would need a lot of work because the APIs have changed significantly as we moved to Beta.
Can you try making a patch and rolling it in?
I'll review it anyway, because it is a killer feature, but I'll probably not get to it very soon if it is a zip file against an old code base.
Thanks!
-Jacob
Comment #5
pwolanin commented@Scott - I think there are (at least) two possible approaches: index nodes and users in the same index, or create a totally separate schema for users and expand the apachesolr framework module to more easily handle indexing multiple types of content into different indexes.
Frankly, the choice between these two for a particular site may also depend on whether they are using nodes as user profiles, vs. the core profile module or another solution. However, to me it seems that at least thinking about the latter would be useful.
For inclusion in this module, we should probably attempt a relatively bare-bones search that might just look at core profile module. It might not even meet your needs, but would be more general.
It's not clear to me that to index users into the same index you'd even need to change the schema, since the mail field could go into a dynamic field.
Comment #6
Scott Reynolds commentedIn regards to
is because I wanted to use it in the dismax equation add weight to that field.
My thought on node vs users is simple. If your using node profile/bio whatever else then u don't need this. And I really like it that they are in the same index. The facet searching that provides is cool and I think adds value. And through the use dynamic fields, not sure there is a technical need to build out a different schema and index.
I am willing to build it out for core profile. But I should mention that this works without any profile data. Meaning that if all you have is the basic drupal install, and apachesolr turned on its fine. It makes the users name and mail searchable. (Which is in line with Drupal core, user_search() implementation)
In regards to api change, this module is 123 lines of code with comments. It is pretty tiny. They only place without looking at the changes in Beta would be
That is really the only place in this code it uses the Apachesolr PHP Client libraries. Everything else is self-contained. So I encourage you to take a look.
Comment #7
baumanis commentedI've put the apachesolr_users module into drupal 6 but it did not work for my 6 yet. The new table apachesolr_usrs_queue gets updated, but the function apachesolr_users_index_documents does not update the solr.
Since i am not up the learning curve of drupal module coding yet, I have done this the old bandaid way until the author of this module (Scott Reynolds?) decides to work on it (which will be lovely and elegant). Basically right now I am running a script through cron that collects my user table uid and name and then creates an xml file out of it. Then the cron uses the post.jar to update the solr with my new xml file. This way user info shows up with all the rest of the search and I don't have to use the coresearches modules with their extra tabs.
Comment #8
Scott Reynolds commenteddoh severe bug in there
should be
Comment #9
robertdouglass commentedNo matter which approach we take I'd like to be able to guarantee that you don't have to go to a new tab to search for users. I'd like to use a unified search and get both users and nodes in the results.
Comment #10
pwolanin commented@Robert - really?
For that use case now, just use nodes-as-profiles. In the longe3r term, I think we would want to have a federated search - I think it's very poor to mix users and content in the same result set. E.g. : http://www.princeton.edu/main/tools/search/?q=stock has two panes of search results.
Comment #11
baumanis commentedMy users want content and user info in one search result set. I guess it depends on each drupal website's audience what kind of search result set to provide. So, it would be nice to have a choice of separating these or putting them together.
Comment #12
Scott Reynolds commentedI think we need to separate out here indexing vs display. I do happen to think that representing users on the same level as content types is advantageous. I do think though, mixing users and nodes together can sometimes be a mess. So I think that providing a default facet of just 'nodes' is a good thing, and then can be turned off so the admin can say "mix nodes and users". So I think that in terms of the Apachesolr index, type:user should be there.
And i think this speaks to providing a single function
Thats how I would try to design it. And i believe that if $params['fq'][$key] = 'type:page' where there, it wouldn't get overridden in this function. Thereby, allow us to have a default behavior of just filtering to node types.
hope that makes sense, and i think that could work.
Comment #13
rapsli commentedany chances, this gets into the module?
Comment #14
deltab commentedSubscribing...
Comment #15
deltab commented+1
Comment #16
pwolanin commentedIf we want to go this route: the "type" for users needs to be a string that could never be a valid node type - e.g. 'user' is not acceptable, since I can create a node type using that string.
This is actually a more general problem also - for file attachments we are adding an extra Boolean, but that's not very scalable.
perhaps generally something like "object/user" and "object/file" or "object-user" or "a_s-user" or "a_s@user" or "@user"?
('a_s' == apachesolr_search)
Comment #17
Scott Reynolds commentedwhat if it was a new facet, 'entity_type' instead of fancy strings. So "entity_type='node'" or "entity_type='user'" or "entity_type='comment'"
Of course this hits on the comment searching of 2.x, but given the recent Drupal nomenclature, I think this will translate well and would really provide more power in a clean way.
If we want to go the fancier strings route, can we leverage apachesolr_document_id() and facet.prefix?
Comment #18
pwolanin commentedWell, it would be nice to figure out some usable approach for 6.x-1.0
Comment #19
pwolanin commentedWe are already roughly do this with the ID field :
where the 'type' param there is the same as what you suggest as entity_type. i imagine, however, that doing a wildcard match on the ID field will perform much worse than if we have this as a separate string field.
Comment #20
Scott Reynolds commentedYa that was my final comment in 17. Spent some time on it early this morning couldn't come up with a better solution then wildcarding on the ID field. I don't think that is a good idea. Adding an 'entity_type' field to the schema seems like the way through to me.
Comment #21
pwolanin commentedHere's a possible schema change patch. This would more readily enable this user search as contrib.
Comment #22
robertdouglass commentedDiscussed with Peter. +1 for entity type.
Comment #23
anarchivist commentedEntity type sounds good for me. Once this code gets committed I recommend writing up similar docs that would act as implementation guidelines for non-node content.
Comment #24
pwolanin commentedI think we should just call it "entity" in the schema - no reason to make it longer.
Comment #26
pwolanin commentedWell, even if you want them separate, adding such a field make it easy to filter the search results if all the data is in one index.
Comment #27
pwolanin commentedlike so.
Comment #28
robertdouglass commentedI believe this change is positive.
Comment #29
Scott Reynolds commentedSmall change
should be entity not enty_type.
Also probably need to set fq[]=entity:node on apachesolr_search queries. That way we a have a clear seperation.
Comment #30
robertdouglass commentedAh, yes, entity:node is a good catch. This change requires a re-index so it should be clear in both CHANGELOG and release notes.
Comment #31
pwolanin commentedI think we can omit the fq by default - there is no need if you are only indexing nodes.
Comment #32
Scott Reynolds commentedWell that would mean that user search would then need to say "For all queries that arn't mine fq[]=NOT entity:user".
I really think it should be the job of the 'entity' search module to add its fq in there. Things get kindof silly from there. so if you have a comment one that its fq[] = NOT entity:user fq[] = NOT entity:comment.
I take it you don't want add the fq for performance, just not sure its worth the performance gain.
And I know I have argued previously for showing nodes and users in the same result page. But having implemented a couple sites with both user and node searching, I think I was the only one to get that paradigm and probably the only one who thought it was cool.
Comment #33
pwolanin commentedWell, mostly I don't want to force re-indexing on people who don't actually need it.
Comment #34
robertdouglass commentedIf we want to avoid re-indexing then we should be focusing this effort on 6.2. I believe Scott's argument is correct, so the decision mainly comes down to whether we should be making schema changes to 6.1. I've held that we shouldn't be, but I defer to Peter for the final decision on 6.1 releases.
Comment #35
gregglesI think it's fine to force re-indexing. We should just have a warning message "if you enable this module you will have to reindex your site" on the download page or the project release page.
Is the code in a currently testable state?
Comment #36
robertdouglass commented#641954: Update schema.xml for Solr 1.4 changes to schema version contained the entity field, and Scott says he's going to make a standalone user search module, so this issue is closed. greggles, note that the only warning for reindexing we'll have, currently, is in the release notes. People's site's won't break, however, so tracking latest devel versions is safe enough.
Comment #37
pwolanin commentedActually - user search per se is "won't fix" but my last patch needs to be applied in some form
Comment #38
pwolanin commentedre-roll less schema changes.
Comment #39
pwolanin commentedCommitting this minimal patch to 6.x-1.x - need to come back to the language code soon.
Comment #40
robertdouglass commented#38 was committed to 6.2 and 5.2.
Comment #41
Scott Reynolds commentedAgain, there is no fq[] = entity:node.
See #32 for the argument as to why this is a bad idea.
Comment #42
pwolanin commented@Scott - I think it's a bad idea to add it by default. I want to always mix file and node results, for example.
Comment #43
Scott Reynolds commentedThen how do I solve this?
How do i prevent user documents from showing up on the file + node search? Am I going ot have to do that? Seems incredibly brittle and prone to issues. Going to have to do a special case for Solr Views to not at the clause.
That feels pretty dirty.
Comment #44
pwolanin commentedWhat's the alternative - I have to know to remove or OR together any entity fq entry to search everything in the index?
Unfortunately, neither of these are ideal situations. At least for the 2.x branch, we can think about a hook to collect all entity types and apachesolr_search could limit to an admin-selected subset for example.
Comment #45
Scott Reynolds commentedI think this supports my point. Either build out the 'files + node' properly or not at all. (side note: where is the issue for files + nodes?)
My real problem with this change is that it makes implementing the Apache Solr api harder on other module developers. If we keep this as is, then I ask that we add something to the documentation and draw a red box, a red arrow and blinking lights around it.
Hence, CDW, we need documentation for the module developer.
Frankly, surprised you are pushing for this, you were the one in who was against this in #10.
Comment #46
robertdouglass commentedAdding additional cores, schemas etc. and collecting results from multiple search indexes is not in the works. Thus, if we want to support searches on different entities they have to share one index. We can keep working on the schema and the logic to support this better, but I'm fully in favor of indexing entity type. I don't find it an unreasonable requirement for vertical search implementations to have to add entity:foo as a filter to all searches.
Scott, it's not your responsibility as the user search module author to prevent users from showing up elsewhere, it's rather the responsibility of the other implementations to prevent them from showing up. Going forward it has to be assumed that all sorts of stuff can be in the index, and you have to ask for exactly what you want, or you risk getting stuff you didn't count on.
Comment #47
pwolanin commented@Robert - ok, so you are agreeing with Scott. My point is that we need some systematic way for modules to opt in or out of a particular query.
For 6.x-1.x the patch as committed does nothing except add more data into the index that can optionally be used. In that sense, it's a feature beyond what the module supported before. We certainly did not cause a regression. Hence "fixed".
@Scott - please open a new issue for discussing how to move forward so we can improve the API.
Comment #48
robertdouglass commentedPeter, I think I partly agree with Scott. I see the main search module as the dumping grounds of everything that gets indexed, but I think he sees this as problematic, and maybe you do too. So yeah, if we don't want users to show up in main search, then we have to specify what entities main search searches on.
In the not too distant future we'll be doing this all with Views and having the entity index is definitely a good thing there.