Allow name sorting of facets
der - February 10, 2009 - 20:22
| Project: | Apache Solr Search Integration |
| Version: | 6.x-2.x-dev |
| Component: | Code |
| Category: | feature request |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | fixed |
Description
This patch allows the admin to specify either the count or name for sorting the facets filters within the blocks.
| Attachment | Size |
|---|---|
| facet_sort.patch | 4.8 KB |
| facet_sort_order.jpg | 20.25 KB |

#1
Usability nitpick; should be radio buttons as options are mutually exclusive. See: http://www.useit.com/alertbox/20040927.html
Haven't actually tested the patch so I'll leave the status untouched.
#2
use radios
instead of:
$sort_preference == 2either use define() statements, or use strings that are meaningful. Perhaps use a switch statement rather than if/else in case we add more.
You need to use t() for the options text:
'#options' => array(1 => 'Count', 2 => 'Name'),#3
Tis true; radio's are the standard convention for mutual exclusivity. However, I disagree this helps usability (at least in this case). Radio's take up more room, provide more upfront options, and hence sometimes make the full page more intimidating. In this case, i think it's probably better to use radios for convention reasons only. I'm not sure the choice will positively impact usability.
#4
#5
New patch w/radio's and switch statements.
#6
In the case statement you seem to repeat the default code - might be better as:
case 'count':case default:
#7
attached
#8
Hopefully one last one with minor wording changes for the block settings page.
#9
need to check that something is set:
$sort = variable_get('apachesolr_facet_sort_orders', array());$sort_preference = isset($sort[$module][$delta]) ? $sort[$module][$delta] : variable_get('apachesolr_facet_sort_order_default', 'count')
#10
Ok but the default case in the switch statement also takes care of it as well. If we want a re-rolled patch it will be a few days before I can get back to this.
#11
The default in the switch only takes care of it if you have not changed the variable.
#12
attached
#13
#14
Also, there may be a problem if the top facets are sorted such that they don't appear initially.
#15
I'm not sure I understand the potential issue you are describing. The facets are either sorted by name or by count and they are always sorted prior to being split into the "visible" or "hidden" html elements in the theme_apachesolr_facet_list() theme function. I don't think the issue you are describing is a problem.
#16
As far as I understand, the mlt handler returns facets based on their frequency - so if they are alpha-sorted, the most frequent terms may not appear at all initially - this seems rather broken.
#17
I'm sorry, I'm still not getting it. Maybe I'm missing something important about how this works. I assume all of the facets are exposed to the block hook in the response object ($response->facet_counts->facet_fields) You said:
If I want them to be alpha-sorted then why do I care if some of the most frequent terms may not appear initially assuming I can still expand the list to see them?
#18
The end-user will not have a choice about the facet order, so I think hiding some of the top facets would lead to a bad user experience - even if they are only a click away, many users will not explore that option.
#19
I think I've understood what Peter is saying...
Here's an example to illustrate:
Say the current search returns 500 items, and each has a different author. Therefore the "authored by" facet would have 500 values for this search. However, we are only asking Solr to return the N "most frequent" facets due to performance/bandwidth issues. If N = 10, this would leave out 490 facet values.
So sorting these 10 facets makes no sense-- we would be leaving important facet values off the list (e.g. showing the sorted list "Albert, Bart, Camila,..." but would be leaving out "Alphonse, Amadeus, Amygdala..." which *are* in the current result set)
The only case where this would make sense is if we are sure that the amount of possible facet values for this facet is N or less (the same amount the admin has configured each block to show).
The "right" solution is to instruct Solr to return facets based on alpha sort instead of frequency using facet.sort; see: http://wiki.apache.org/solr/SimpleFacetParameters#head-569f93fb24ec41b06...
This, however, means we need the functionality in the Solr PHP client *and* in the .module.
#20
I think I understand now. So my solution would work for use cases of where we know that we have a relatively small number of facets and we've configured all of the facets to be shown (which happens to be my use case).
Thanks. I'm not sure if I'll have time to work on the solution for this but if I do I'll be sure to post the patch here.
#21
#22
I think we need to take clues from the Solr level params that may be coming in from our modify_query hook:
facet.sort=index/false
See http://drupal.org/node/521828 for a case where this is desired. We could build the UI on top of that, but we should use facet.sort as the trigger mechanism for the actual sorting so that we come closer to expected Solr behavior. I'm also moving this feature request to the 6.2 branch.
#23
I've been poking around this, and have these issues:
My idea (and robert's above) was to get the sorted facet values from Solr by issuing a f.XXX.facet.sort = index in apachesolr_search_add_facet_params(), however this would only work with facet fields that are stored as strings, like CCK text fields. We can't ask for sorted facets this way for users (we store the uid) taxonomy (we store the tid) node types (we store the machine name) because sorting these gets us nowhere.
So, for taxonomy and other integer storage-based fields, we can sort with one of these two approaches:
1) Sort after the Solr request returns, but the only (?) applicable use case is what @der mentioned in #20: we have a small number of facets and they are all returned on each request; this way PHP can sort those facets and nothing is broken.
2) Change the underlying storage schema so we store something that's sortable by Solr and also gives us the needed IDs. For instance for term "apple" with tid 3, we could store the string "Apple|3", which is doable IMO. We should then think of allowing storage of localized versions, like "Manzana|3" for the spanish equivalent in another field.
For taxonomy, imagine TS_VID_[vid]_NAMES_[language] fields that stores strings like "Apple|3" but Solr only indexes the "Apple" portion.
SO in conclusion...
(a) For CCK fields, it's ok to allow alpha sorting (is it?) and we can use @der 's code above.
(b) For taxonomy, user, and node types we could start out by using (1): admins can pick the post-fetch sort option, with the UI showing a warning message telling them that "enabling this option for indexes with more facet values than those configured to display in this block will result in unpredictable results". (It could even check how many distinct facet values there are for that field at block config time and include that information for the admin).
Thoughts please?
#24
I think #23 is solving a different problem than what I understand the issue to be.
The issue at hand, as I see it, is to sort the facets that come back alphabetically, in PHP, for display purposes. Right now they're sorted by # of documents descending, but very often this makes it hard to find the facet that you're looking for. The approach I'd like to see is to add either a config option (global), or a javascript option (sort icon) per block that reorders the li items in the block either by doc count or by alphabet.
#25
Perhaps this is just a matter of putting in the functionality and let the admin decide =)
You propose: get just the most frequent facets (and then sort by occurrence or alpha).
I propose: get the first (alphabetically-speaking) facets (regardless of their occurrence) when showing alpha-sort, use the current behavior for occurrence-sort. (And the problems mentioned in #23 come up).
I think your proposal is simple (code-wise) but maybe not best usability-wise (but, then again, we can leave it up to the admin). IMO it's easy for the end-user to grok that we're returning the N most frequent facets when they´re ordered by number of results they'll bring in. I don't thin it'll be as easy to figure out when we're showing them ordered alphabetically... Will a user realize why some facets have been left out from display... and what he can do to get them?
My ideal behavior for all facet blocks would be to be able to scan the entire facet list (alpha- or occurrence-sorted) using paged AJAX requests, or on a separate page (faceted_search.module allows the latter).
Your call =)
#26
I think different solutions are needed for the problem of "too many facets to be visible in a block of links".
Like table sorting before it, alpha sorting a block of facets won't make it much easier to get to the buried facets in the "M" region.
The alpha sorting of the returned (highest frequency) facets that I propose has other advantages. For example, if using the new OR facets, it keeps the order of the facets between requests. In fact, my proposed version of alpha sorting facets should be the default behavior for OR facets.
For filters that generate more than ~20 facets we need different widgets. Perhaps a "total list alphasort plus pager" widget is one of them. Before we can really introduce more widgets, though, we need a better architecture for choosing the widgets.
#27
Seems like alpha sorting of facets makes no sense if some of them are hidden by JS - the most comment facet might be hidden then.
#28
@pwolanin: I *think* your comment in #27 mirrors what has already been discussed above =)
That is, while it *is* possible for facets to be hidden by JS:
* The admin might not care and wants that sort anyway=)
* The admin knows that the number of facet values does not exceed X so none will ever be hidden =)
Perhaps we just need to add in a warning message in the above patch...?
+++ apachesolr.module 10 Feb 2009 22:09:29 -0000@@ -700,6 +711,14 @@ function apachesolr_facetcount_form($mod
+ $options = array('count' => t('Count'), 'name' => t('Name'));
+ $form['apachesolr_facet_sort_order'] = array(
+ '#type' => 'radios',
+ '#title' => t('Filter sort order'),
+ '#options' => $options,
+ '#description' => t('The order the filters appear within the block.'),
+ '#default_value' => isset($sort[$module][$delta]) ? $sort[$module][$delta] : variable_get('apachesolr_facet_sort_order_default', 'count'),
+ );
Maybe a warning should go here...
+ '#description' => t('The order the filters appear within the block. Warning: if you have more possible values for this filter than the "initial number of filter links to show in this block" option, then sorting alphabetically might give users the impression that some filters do not exist in the current result set.'),Thoughts?
This review is powered by Dreditor.
#29
@janusman "The only case where this would make sense is if we are sure that the amount of possible facet values for this facet is N or less (the same amount the admin has configured each block to show)."
Exactly - in cases where facet data is fairly homogeneous (ex all of your widgets fall into one of four categories), you don't tend to have a problem. The vast majority of the publication years in our publications database, for example, are fairly narrowly distributed.
And the most common use case for our year facet is a known item query ("I know the thing I'm looking for happened in 2009"), for which an alphanumeric listing is more easily scanned. We do provide an advanced search option for this, but some users still rely on facets. Facet counts are good for giving you a high level overview but not so good for these known item situations (locating a specific author, year, whatever). But yeah, as the data gets more heterogeneous, and the size of the data grows, an alphanumeric sort does indeed become kind of pointless.
Bottom line, I support allowing an alphanumeric sort option in some form or other. I personally will need to implement it whatever the outcome of this patch.
@janusman "The "right" solution is to instruct Solr to return facets based on alpha sort instead of frequency using facet.sort"
Yeah, I thought it curious that we do
ksort($items, SORT_STRING)after having previously set'facet.sort' => 'true', which effectively overrides the solr setting. I note here the API change in solr 1.4: "Solr1.4 -- the true/false values have been deprecated starting with Solr 1.4, instead use count for sorting by count, and index for sorting by index order." http://wiki.apache.org/solr/SimpleFacetParameters#facet.sort@robertDouglass "Perhaps a "total list alphasort plus pager" widget is one of them"
That would be fracking awesome. I would implement this tomorrow for several of our facets if it were available. One thought I had there - allow users to toggle between count and alphanumeric/pager widget thingy - that might allow us to have the best of both worlds.
#30
I took a stab at this and think I came up with something similar to der. My patch doesn't modify the query to sort by 'index' or 'count' - it always queries with the facet sort being 'count'. It then uses the facet text to sort alphanumerically if that has been requested. In this way we get a "happy" medium without adding too much complexity. We always get the facets with the most results, but they can be sorted in one of two ways.
Until we build all these blocks with Views, this is what we'll have to live with.
Committing to 6.2.
#31
Leaving open for a time for review and followup.
#32
Awesome, thanks for working on this, Robert.
I just tested the patch and it worked as described.
Sorry to be a pest, but could we also have a toggle for asc/desc? With letters, you'd probably only want to sort ascending, so I can see why you chose that. But numbers are a little different - my publication year range, for example, example makes more sense sorted with newest (desc) year first.
Thanks again.
#33
@libsys, ok, but only cuz you're a nice guy. Any chance you could merge this patch and the previous and repost? I'm committing this one to 6.2 as well.
#34
Heh, I guess I've always subscribed to the "catch more flies with honey than with vinegar" approach =).
The attached patch is just a simple combined
diff -urppatch as CVS has already committed to the 6.x-2.x-dev branch.#35
This seems to be working nicely.