I need to get Solr doing Field collapsing/result grouping which amounts to sending some extra parameters to Solr, and interpretting the results a little differently when they come back.

To do this cleanly I've proposed the notion of 'Query extenders' in #1720348: Add the concept of query extenders which mean that the collapsing behaviour can be turned on and off per query.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Steven Jones’s picture

Status: Active » Needs review
FileSize
6.19 KB

This patch requires the patch in http://drupal.org/node/1720348#comment-6330704

It does the following:

  1. Provides a Query extender to allow the user to configure field collapsing in the UI when building a search query.
  2. Sends the correct parameters to Solr when requested to do so.
  3. Interprets the results differently, so that paging and things can work correctly.
Steven Jones’s picture

And this patch allows you to specify a sort order within the groups.

Steven Jones’s picture

Actually the previous patch was flawed and wouldn't actually return any results.

das-peter’s picture

Hey Steven, thank you very much for the patch. I'm playing around with it to figure out how I'll create a feature I need for a customer :)

What would you think about changing search_api_solr_field_collapse to search_api_grouping? I think collapsing / grouping isn't a feature only of / for solr - thus it would be nicer to use a more generic name so that other service adapters can hop on an implement this feature on their own without adding more feature names.

das-peter’s picture

Status: Needs review » Needs work

Just struggled over this:

+++ b/includes/SolrSearchAPIFieldCollapseExtender.inc
@@ -0,0 +1,90 @@
+      if (!search_api_is_list_type($type)) {

As far as I can see $type is undefined here. I think it should be $field['type'].

das-peter’s picture

I've created a sandbox for my work on this topic Search API Grouping.
It provides processors for splitting indexing items and grouping results.
The grouping code relies on the code provided here. This now introduces the issue related to query extenders I mentioned here: #1720348-3: Add the concept of query extenders

I suggest to handle just the grouping support in this issue and the query extender feature in this one: #1720348: Add the concept of query extenders
To do so, I've split the patch into one that only deals with the grouping mechanism and one that provides the query extender. I fixed some coding standard issues and the issue mentioned in #5. Besides that I renamed the feature according to my suggestion in #4.

das-peter’s picture

I've updated the field collation support patch - I struggled over some small issues.
The query extender patch from #6 is still valid.

Steven Jones’s picture

Thanks for the feedback.

My impression is that Solr field collapsing will have different parameters and options from a similar feature in say MongoDB, and so making the support of it specific to Solr is what's wanted here I think. If a little way down the line they actually end up being very similar, then yes, this could be combined in a search API backend feature.

I've added the ability to change the amount of results returned in each group, which meshes really well with Views' built in render-time grouping actually!

This patch is based on mine in #3, sorry.

Steven Jones’s picture

Sorry that patch had a debugging line in it.

drunken monkey’s picture

Status: Needs review » Needs work

This looks like a handy feature, and if the results still look the same (i.e., adhere to the contract of SearchApiQueryInterface::execute()), I see no reason why we shouldn't include this feature and option.

The only problem I see is that, if I understand it correctly, the returned results could be kinda weird in some cases (when grouping on a field with only a few different values), having a large result count but still returning less results than the limit, massive problems with the pager, and so on. That would, at least in some sense, violate the method's contract. Maybe we should just return the number of groups as the result count?

Thanks for splitting, Peter – I do think we should only handle the Solr support in this issue. The query extender isn't Solr-specific at all (and shouldn't be) and should probably go into another project. This project should then also contain clear documentation of the syntax and semantics of the search_api_grouping feature (or whatever name we end up using). The Solr module should then adhere to those specifications as closely as possible.
I'll look at the patches more closely later.

Steven Jones’s picture

the returned results could be kinda weird in some cases (when grouping on a field with only a few different values), having a large result count but still returning less results than the limit, massive problems with the pager, and so on.

I'm not sure it's odd, Solr at least will return the correct number of groups, and the paging works just fine. If you're returning one item per group then it's really transparent to the outsider. If you are returning more than one result per group then yes, it is a little weird, e.g. ten items per page, but you actually get one hundred back from Solr.

the search_api_grouping feature

I would suggest looking at adding this feature to the MongoDB Search API backend or similar and seeing if the feature can be generalised to those backends in a consistent manner before deciding to name this non-Solr specifically.

das-peter’s picture

Glad to see some movement here :)

I would suggest looking at adding this feature to the MongoDB Search API backend or similar and seeing if the feature can be generalised to those backends in a consistent manner before deciding to name this non-Solr specifically.

As SQL understands grouping I'd bet there's need in other search backends too. Thus I'd say the point is more to figure out which options the different backends have in common to create a generalized options set.

drunken monkey’s picture

I'm not sure it's odd, Solr at least will return the correct number of groups, and the paging works just fine. If you're returning one item per group then it's really transparent to the outsider.

But if you, e.g., have only three groups but twelve results, wouldn't you get a pager with two pages, but only three results listed and an empty second page? That doesn't seem very transparent to me.
You could return the total number of results in a separate key, something like search_api_grouping_result_count, so as to enable displaying the real number somewhere – but the pager should definitely use the number of groups as the basis.

I would suggest looking at adding this feature to the MongoDB Search API backend or similar and seeing if the feature can be generalised to those backends in a consistent manner before deciding to name this non-Solr specifically.

I agree with Peter here – it would definitely be possible in other backends, too (we also don't know what other backends might come in the future), and there is really no advantage to naming it in a Solr-specific way. Currently, nearly no feature is supported by a non-Solr backend, but still none is named in such a way.

Steven Jones’s picture

But if you, e.g., have only three groups but twelve results, wouldn't you get a pager with two pages, but only three results listed and an empty second page? That doesn't seem very transparent to me.

So with the default settings that the Solr implementation provides, it returns 1 result per-group, and gets Solr to use the groups for paging information. So in the above scenario you get back from the backend that there were three results total, and these three results returned.

Solr supports returning N results per group (and then paging through those in addition to paging through the groups themselves) and so in the latest patches that is supported. What that looks like though is getting 12 results back when the paging information indicates that there are only 3 results (the 3 groups still). This is a little weird at this stage, and is clearly stated so in the UI where you are given the option. This actually works really well with Views result grouping though, because actually you can get all the results for each group back, and page through the groups, which is usually what you want to do with Views result grouping anyway.

Steven Jones’s picture

Here's a patch that builds on the patch in #9 but adds support for the new Solr 4 group.facet parameter.

dasjo’s picture

while working on Geocluster's solr integration, i found the need to use grouped results with search api solr. for a more details see #1800850: Use a solr plugin with search_api_solr for clustering

in my case i'm adding grouping functionality on the code-level, so i'm not depending on the query extender. i therefore sticked to the field collapsing patch from #6. find a re-roll attached (search_api_solr-field-collapsing-support-1721262-16.patch).

for completeness, i'm also attaching (search-api-solr-1721262-field-collapsing-16.patch) which is an update of #9.

stella’s picture

I got this working with the patches from comment 9 and comment 15. However, I'm using Solr 4.x (with config files from the sandbox at http://drupal.org/sandbox/cpliakas/1600962) and while it works perfectly, the above patches only allow field collapsing on string type fields (in line with what Solr 3.x supports).

In case anyone else is also using Solr 4.x, the attached patch removes that restriction. As this module only officially supports 3.x, I've just rolled this patch on top of the above patches, rather than rerolling them.

stella’s picture

Here's an additional patch to expose the per group 'numFound' field to Views, so we can display the number of records in a grouped result.

Anonymous’s picture

FileSize
11.66 KB

What are the steps to use the UI to configure field collapsing? I installed all patches and Search API Grouping, but fail to see where to configure anything.

Attached a re-roll of patches #9 + #15 + #17 + #18 against latest devx.

drunken monkey’s picture

As already said in #10, the query extender isn't Solr-specific and should thus be contained in a different project, not in this patch. I also think the additional Views field, although probably tricky to add otherwise, is much too specific to be included in the module.

So, attached is a re-roll with those changes, along with some others (e.g., using search_api_grouping as the featurename and option key).

One thing still needed is a module actually defining the search_api_grouping feature, so we have a clear spec on which options we actually need to recognize, and for other developers to know how they can use this feature. This would then also need to be referenced in our README.txt file, like the other features.
A module with Steven Jones' query extender would be a candidate for that, if he is still interested (or someone else wants to go ahead and create it). Defining it right in this module would be possible as a “last ressort”, too, but would be anfavorable because the feature is neither Solr-specific nor actually used by this module.
In any case, we need the feature documented somewhere. (And when doing so, I would vote for removing the search_api_grouping[collapse] key. I don't think there's any use in specifying the option but setting this to FALSE, or is there? Also, I'd combine group_sort and group_sort_direction into a single associative array which also allows for multiple sorts.)

Other than that, the patch seems to work fine. The only thing that needs to be fixed there is not setting group.field for fields that aren't supported (multi-valued, etc. – essentially the same checks that were also in the query extender), maybe with adding a warning to the response additionally. I also think the query extender, if it gets added to some module, should support all fields, as other backends might have different restrictions.

das-peter’s picture

I've started first working with this in a sandbox called Search API Grouping, later I've switched over to jgraham's sandbox Search API Denormalized Entity Index and created the similar functionality based on his approach in the 7.x-2.x branch.
As both projects still are sandboxes I could port the 7.x-2.x branch of "Search API Denormalized Entity Index" to "Search API Grouping" and add all the missing stuff needed to close this issue here.
Sounds that like a viable plan?

drunken monkey’s picture

Yes, that would be great!

das-peter’s picture

Here we go

  • Ported the functionality from jgraham's sandbox into the Search API Grouping sandbox
  • Updated module description on the project page
  • Prepared patch to use query extenders: http://drupal.org/node/1989856
  • Slightly adjusted the attached patch - was just about wording. I think it's better to use the word "grouping" instead "collapse" as it's less solr specific.
  • drunken monkey’s picture

    OK, great!
    However, as far as I can see a documentation for the search_api_groupingfeature is still missing. Please add that to the README.txt, then I think we're all set. (If you don't know how you'd do that, see, e.g., the documentation of the features defined by search_api_views and search_api_facetapi, in their respective README.txt files.)
    Except for the last paragraph in #20, that is.

    das-peter’s picture

    as far as I can see a documentation for the search_api_groupingfeature is still missing

    Documentation added: http://drupalcode.org/sandbox/daspeter/1783280.git/blobdiff/e89237d73c08...

    Other than that, the patch seems to work fine. The only thing that needs to be fixed there is not setting group.field for fields that aren't supported (multi-valued, etc. – essentially the same checks that were also in the query extender), maybe with adding a warning to the response additionally. I also think the query extender, if it gets added to some module, should support all fields, as other backends might have different restrictions.

    That should be part of the service class, right? Since I don't think the Search API Grouping module itself should or can deal with the service specific restrictions?

    Shall I promote Search API Grouping to a full project?

    drunken monkey’s picture

    Documentation added: http://drupalcode.org/sandbox/daspeter/1783280.git/blobdiff/e89237d73c08...

    Seems fine, thanks!
    As previously noted, I'd combine the two keys for sorting to a single one containing an associative array – but if you want to keep it like that, we can do that as well, of course. In that case, you should add that the first one contains the field name, though.
    Anyways, as far as this issue here is concerned, that documentation will do fine, thanks!

    That should be part of the service class, right?

    Yes, exactly.
    Oh, and we should also add documentation that we support this feature (and a link to the Grouping project) to the README.txt.

    Shall I promote Search API Grouping to a full project?

    If you are confident it's usable, then please do so. I don't think it's necessary for our issue here, though.

    das-peter’s picture

    As previously noted, I'd combine the two keys for sorting to a single one containing an associative array

    Hmm, that makes sense.
    How about:

    group_sort[] = array(
      'field' => 'field_foo_bar',
      'direction' => 'asc',
    );
    

    This allows multiple fields and easy handling of fields or directions if necessary.

    Oh, and we should also add documentation that we support this feature (and a link to the Grouping project) to the README.txt.

    I'll do that as soon as we agree on the group_sort format. Then I've to adjust the patch anyway.

    If you are confident it's usable, then please do so. I don't think it's necessary for our issue here, though.

    I'm never confident enough that something I wrote really is usable :P But since we use this code for months now I think I'll give it a try.

    drunken monkey’s picture

    Hmm, that makes sense.
    How about:

    group_sort[] = array(
      'field' => 'field_foo_bar',
      'direction' => 'asc',
    );

    This allows multiple fields and easy handling of fields or directions if necessary.

    Would be an option, of course – but I was more suggesting the format $query->getSort() already uses:

    group_sort['field_foo_bar'] = 'asc';
    
    I'm never confident enough that something I wrote really is usable :P But since we use this code for months now I think I'll give it a try.

    I can sympathize with that – see my countless non-stable modules … ;)
    But yes, do that!

    das-peter’s picture

    Status: Needs work » Needs review
    FileSize
    2.4 KB
    4.75 KB

    Here's the updated patch which includes the group_sort formatted as proposed in #28 as well as the requested update of the README.txt

    drunken monkey’s picture

    Great, thanks for your work!
    However, the check for which fields allow grouping was still missing. I added that in the attached patch. Please test whether this still works for you, and then we can finally commit.

    das-peter’s picture

    Status: Needs review » Reviewed & tested by the community

    Just gave it a try: Works and looks good.
    Only thing I noticed is this error reported from the code sniffer Concatenating translatable strings is not allowed, use placeholders instead and only one string literal, which is related to this:

    $warnings[] = t('Grouping is not supported for field @field. ' .
                    'Only single-valued fields not indexed as "Fulltext" are supported.',
                    array('@field' => $index_fields[$collapse_field]['name']));
    

    Should look like:

    $warnings[] = t(
      'Grouping is not supported for field @field. Only single-valued fields not indexed as "Fulltext" are supported.',
      array('@field' => $index_fields[$collapse_field]['name'])
    );
    

    But for me that's not really a showstopper. Thus I'd say RTBC :)

    Anonymous’s picture

    Might there be an issue left with the sort field? After appyling the patch from #30 I go this:

    Warning: Invalid argument supplied for foreach() in SearchApiSolrService->search() (line 734 in /sites/all/modules/contrib/search_api_solr/includes/service.inc).
    

    This line:

    foreach ($grouping['group_sort'] as $group_sort_field => $order) {
    

    The $grouping['group_sort'] looks like 'price' instead of array('price' => 'ASC')
    (Using the 7.x-2.x branch from the search_api_grouping sandbox.)

    drunken monkey’s picture

    Status: Reviewed & tested by the community » Fixed

    Only thing I noticed is this error reported from the code sniffer Concatenating translatable strings is not allowed, use placeholders instead and only one string literal, which is related to this:

    Ah, yes, this occurs very frequently in my modules. I just think it's both ugly and impractical to have lines with several hundred characters. I also haven't really found documentation that this is forbidden, let alone why. Therefore, I'd leave that in there until someone gives me strong enough reasons to change this (at which point I'd have to patch nearly all of my modules).

    Might there be an issue left with the sort field?

    Ah, yes, seems the module wasn't updated with the change in #29 yet. However, that has to be fixed there, I don't think we have to wait with this patch for that.

    So, committed!
    Thanks again for your work, everyone!

    das-peter’s picture

    @morningtime Are you sure you used the latest version of Search API Grouping? Because the repository viewer tells me the latest version should properly generate the new group_sort format: http://drupalcode.org/sandbox/daspeter/1783280.git/blob/refs/heads/7.x-2... :

    154 if (!empty($this->options['group_sort'])) {
    155 $options['group_sort'][$this->options['group_sort']] = isset($this->options['group_sort_direction']) ? $this->options['group_sort_direction'] : 'asc';
    156 }
    

    If the problem persists, please open an issue in the Search API Grouping issue queue and add information about how you've configured the processor.

    Status: Fixed » Closed (fixed)

    Automatically closed -- issue fixed for 2 weeks with no activity.

    fago’s picture

    Great work everyone. I initially thought that calculation of total-rows is wrong, as it does not match the number of documents returned. However still, I think it's the best approach as it matches limit + the number of groups returned - so it's the only way paging can work.

    So with the default of group limit 1 everything is perfect, with something different this is still the best we can do. I posted a patch to allow users to configure the group limit over at #2026807: Make group limit configurable - I guess it could use some description that number calculation of queries still go to the number of groups though.

    Then I filed #2027843: Make the solr response available as part of the search results to make the grouped-results and extra information available to the API consumer, e.g. to views via $query->getSearchApiResults();

    fago’s picture

    Issue summary: View changes

    Updated issue summary.