Closed (fixed)
Project:
Apache Solr Search
Version:
6.x-2.x-dev
Component:
Code
Priority:
Normal
Category:
Feature request
Assigned:
Unassigned
Reporter:
Created:
10 Feb 2009 at 20:22 UTC
Updated:
8 May 2010 at 22:50 UTC
Jump to comment: Most recent file
Comments
Comment #1
janusman commentedUsability nitpick; should be radio buttons as options are mutually exclusive. See: http://www.useit.com/alertbox/20040927.html
Haven't actually tested the patch so I'll leave the status untouched.
Comment #2
pwolanin commenteduse radios
instead of:
either use define() statements, or use strings that are meaningful. Perhaps use a switch statement rather than if/else in case we add more.
You need to use t() for the options text:
Comment #3
Noyz commentedTis true; radio's are the standard convention for mutual exclusivity. However, I disagree this helps usability (at least in this case). Radio's take up more room, provide more upfront options, and hence sometimes make the full page more intimidating. In this case, i think it's probably better to use radios for convention reasons only. I'm not sure the choice will positively impact usability.
Comment #4
Noyz commentedComment #5
dreed47 commentedNew patch w/radio's and switch statements.
Comment #6
pwolanin commentedIn the case statement you seem to repeat the default code - might be better as:
Comment #7
dreed47 commentedattached
Comment #8
dreed47 commentedHopefully one last one with minor wording changes for the block settings page.
Comment #9
pwolanin commentedneed to check that something is set:
Comment #10
dreed47 commentedOk but the default case in the switch statement also takes care of it as well. If we want a re-rolled patch it will be a few days before I can get back to this.
Comment #11
pwolanin commentedThe default in the switch only takes care of it if you have not changed the variable.
Comment #12
dreed47 commentedattached
Comment #13
dreed47 commentedComment #14
pwolanin commentedAlso, there may be a problem if the top facets are sorted such that they don't appear initially.
Comment #15
dreed47 commentedI'm not sure I understand the potential issue you are describing. The facets are either sorted by name or by count and they are always sorted prior to being split into the "visible" or "hidden" html elements in the theme_apachesolr_facet_list() theme function. I don't think the issue you are describing is a problem.
Comment #16
pwolanin commentedAs far as I understand, the mlt handler returns facets based on their frequency - so if they are alpha-sorted, the most frequent terms may not appear at all initially - this seems rather broken.
Comment #17
dreed47 commentedI'm sorry, I'm still not getting it. Maybe I'm missing something important about how this works. I assume all of the facets are exposed to the block hook in the response object ($response->facet_counts->facet_fields) You said:
If I want them to be alpha-sorted then why do I care if some of the most frequent terms may not appear initially assuming I can still expand the list to see them?
Comment #18
pwolanin commentedThe end-user will not have a choice about the facet order, so I think hiding some of the top facets would lead to a bad user experience - even if they are only a click away, many users will not explore that option.
Comment #19
janusman commentedI think I've understood what Peter is saying...
Here's an example to illustrate:
Say the current search returns 500 items, and each has a different author. Therefore the "authored by" facet would have 500 values for this search. However, we are only asking Solr to return the N "most frequent" facets due to performance/bandwidth issues. If N = 10, this would leave out 490 facet values.
So sorting these 10 facets makes no sense-- we would be leaving important facet values off the list (e.g. showing the sorted list "Albert, Bart, Camila,..." but would be leaving out "Alphonse, Amadeus, Amygdala..." which *are* in the current result set)
The only case where this would make sense is if we are sure that the amount of possible facet values for this facet is N or less (the same amount the admin has configured each block to show).
The "right" solution is to instruct Solr to return facets based on alpha sort instead of frequency using facet.sort; see: http://wiki.apache.org/solr/SimpleFacetParameters#head-569f93fb24ec41b06...
This, however, means we need the functionality in the Solr PHP client *and* in the .module.
Comment #20
dreed47 commentedI think I understand now. So my solution would work for use cases of where we know that we have a relatively small number of facets and we've configured all of the facets to be shown (which happens to be my use case).
Thanks. I'm not sure if I'll have time to work on the solution for this but if I do I'll be sure to post the patch here.
Comment #21
janusman commentedComment #22
robertdouglass commentedI think we need to take clues from the Solr level params that may be coming in from our modify_query hook:
facet.sort=index/false
See http://drupal.org/node/521828 for a case where this is desired. We could build the UI on top of that, but we should use facet.sort as the trigger mechanism for the actual sorting so that we come closer to expected Solr behavior. I'm also moving this feature request to the 6.2 branch.
Comment #23
janusman commentedI've been poking around this, and have these issues:
My idea (and robert's above) was to get the sorted facet values from Solr by issuing a f.XXX.facet.sort = index in apachesolr_search_add_facet_params(), however this would only work with facet fields that are stored as strings, like CCK text fields. We can't ask for sorted facets this way for users (we store the uid) taxonomy (we store the tid) node types (we store the machine name) because sorting these gets us nowhere.
So, for taxonomy and other integer storage-based fields, we can sort with one of these two approaches:
1) Sort after the Solr request returns, but the only (?) applicable use case is what @der mentioned in #20: we have a small number of facets and they are all returned on each request; this way PHP can sort those facets and nothing is broken.
2) Change the underlying storage schema so we store something that's sortable by Solr and also gives us the needed IDs. For instance for term "apple" with tid 3, we could store the string "Apple|3", which is doable IMO. We should then think of allowing storage of localized versions, like "Manzana|3" for the spanish equivalent in another field.
For taxonomy, imagine TS_VID_[vid]_NAMES_[language] fields that stores strings like "Apple|3" but Solr only indexes the "Apple" portion.
SO in conclusion...
(a) For CCK fields, it's ok to allow alpha sorting (is it?) and we can use @der 's code above.
(b) For taxonomy, user, and node types we could start out by using (1): admins can pick the post-fetch sort option, with the UI showing a warning message telling them that "enabling this option for indexes with more facet values than those configured to display in this block will result in unpredictable results". (It could even check how many distinct facet values there are for that field at block config time and include that information for the admin).
Thoughts please?
Comment #24
robertdouglass commentedI think #23 is solving a different problem than what I understand the issue to be.
The issue at hand, as I see it, is to sort the facets that come back alphabetically, in PHP, for display purposes. Right now they're sorted by # of documents descending, but very often this makes it hard to find the facet that you're looking for. The approach I'd like to see is to add either a config option (global), or a javascript option (sort icon) per block that reorders the li items in the block either by doc count or by alphabet.
Comment #25
janusman commentedPerhaps this is just a matter of putting in the functionality and let the admin decide =)
You propose: get just the most frequent facets (and then sort by occurrence or alpha).
I propose: get the first (alphabetically-speaking) facets (regardless of their occurrence) when showing alpha-sort, use the current behavior for occurrence-sort. (And the problems mentioned in #23 come up).
I think your proposal is simple (code-wise) but maybe not best usability-wise (but, then again, we can leave it up to the admin). IMO it's easy for the end-user to grok that we're returning the N most frequent facets when they´re ordered by number of results they'll bring in. I don't thin it'll be as easy to figure out when we're showing them ordered alphabetically... Will a user realize why some facets have been left out from display... and what he can do to get them?
My ideal behavior for all facet blocks would be to be able to scan the entire facet list (alpha- or occurrence-sorted) using paged AJAX requests, or on a separate page (faceted_search.module allows the latter).
Your call =)
Comment #26
robertdouglass commentedI think different solutions are needed for the problem of "too many facets to be visible in a block of links".
Like table sorting before it, alpha sorting a block of facets won't make it much easier to get to the buried facets in the "M" region.
The alpha sorting of the returned (highest frequency) facets that I propose has other advantages. For example, if using the new OR facets, it keeps the order of the facets between requests. In fact, my proposed version of alpha sorting facets should be the default behavior for OR facets.
For filters that generate more than ~20 facets we need different widgets. Perhaps a "total list alphasort plus pager" widget is one of them. Before we can really introduce more widgets, though, we need a better architecture for choosing the widgets.
Comment #27
pwolanin commentedSeems like alpha sorting of facets makes no sense if some of them are hidden by JS - the most comment facet might be hidden then.
Comment #28
janusman commented@pwolanin: I *think* your comment in #27 mirrors what has already been discussed above =)
That is, while it *is* possible for facets to be hidden by JS:
* The admin might not care and wants that sort anyway=)
* The admin knows that the number of facet values does not exceed X so none will ever be hidden =)
Perhaps we just need to add in a warning message in the above patch...?
Maybe a warning should go here...
Thoughts?
This review is powered by Dreditor.
Comment #29
cfennell commented@janusman "The only case where this would make sense is if we are sure that the amount of possible facet values for this facet is N or less (the same amount the admin has configured each block to show)."
Exactly - in cases where facet data is fairly homogeneous (ex all of your widgets fall into one of four categories), you don't tend to have a problem. The vast majority of the publication years in our publications database, for example, are fairly narrowly distributed.
And the most common use case for our year facet is a known item query ("I know the thing I'm looking for happened in 2009"), for which an alphanumeric listing is more easily scanned. We do provide an advanced search option for this, but some users still rely on facets. Facet counts are good for giving you a high level overview but not so good for these known item situations (locating a specific author, year, whatever). But yeah, as the data gets more heterogeneous, and the size of the data grows, an alphanumeric sort does indeed become kind of pointless.
Bottom line, I support allowing an alphanumeric sort option in some form or other. I personally will need to implement it whatever the outcome of this patch.
@janusman "The "right" solution is to instruct Solr to return facets based on alpha sort instead of frequency using facet.sort"
Yeah, I thought it curious that we do
ksort($items, SORT_STRING)after having previously set'facet.sort' => 'true', which effectively overrides the solr setting. I note here the API change in solr 1.4: "Solr1.4 -- the true/false values have been deprecated starting with Solr 1.4, instead use count for sorting by count, and index for sorting by index order." http://wiki.apache.org/solr/SimpleFacetParameters#facet.sort@robertDouglass "Perhaps a "total list alphasort plus pager" widget is one of them"
That would be fracking awesome. I would implement this tomorrow for several of our facets if it were available. One thought I had there - allow users to toggle between count and alphanumeric/pager widget thingy - that might allow us to have the best of both worlds.
Comment #30
robertdouglass commentedI took a stab at this and think I came up with something similar to der. My patch doesn't modify the query to sort by 'index' or 'count' - it always queries with the facet sort being 'count'. It then uses the facet text to sort alphanumerically if that has been requested. In this way we get a "happy" medium without adding too much complexity. We always get the facets with the most results, but they can be sorted in one of two ways.
Until we build all these blocks with Views, this is what we'll have to live with.
Committing to 6.2.
Comment #31
robertdouglass commentedLeaving open for a time for review and followup.
Comment #32
cfennell commentedAwesome, thanks for working on this, Robert.
I just tested the patch and it worked as described.
Sorry to be a pest, but could we also have a toggle for asc/desc? With letters, you'd probably only want to sort ascending, so I can see why you chose that. But numbers are a little different - my publication year range, for example, example makes more sense sorted with newest (desc) year first.
Thanks again.
Comment #33
robertdouglass commented@libsys, ok, but only cuz you're a nice guy. Any chance you could merge this patch and the previous and repost? I'm committing this one to 6.2 as well.
Comment #34
cfennell commentedHeh, I guess I've always subscribed to the "catch more flies with honey than with vinegar" approach =).
The attached patch is just a simple combined
diff -urppatch as CVS has already committed to the 6.x-2.x-dev branch.Comment #35
robertdouglass commentedThis seems to be working nicely.
Comment #37
dpalmer commentedHi guys, great work.
Quick question, do I still need to apply the patch or does the 6.x-2.0-alpha2 version include this functionality? I have that version installed and selected the Alphanumeric sort asc in the block configuration settings yet it does not seem to be working?
Thanks,
Donovan
Comment #38
SergeyR commentedTIP !!
CUSTOM SORT OF FACET LINK FOR CCK LABEL FACET BLOCK
leave spaces before labels of alowed values -they will be invisible in facet blocks but links sorting will take into account that spaces
+ make alphabetical sorting ON
PS Not applicable for content type facet
for taxonomy facets there is no alphabetical sorting at all ((
Comment #39
SergeyR commentedComment #40
SergeyR commentedComment #41
dpalmer commentedno alphabetical sorting at all for taxonomy facets? That's a huge piece of missing functionality? I can't be the only one that needs this?
Comment #42
jpmckinney commentedAlphabetical sorting of taxonomy facets has been fixed in #723492: Facet Block Sorting Not Working as Intended. Please open new issues if there are any remaining issues with the sorting of facets.