OR facets

robertDouglass - August 18, 2009 - 08:23
Project:Apache Solr Search Integration
Version:6.x-2.x-dev
Component:Code
Category:feature request
Priority:normal
Assigned:Unassigned
Status:active
Description

http://localhost:8983/solr/select?fl=nid%2Ctype&facet=true&facet.mincount=1&facet.field={!ex=type}type&fq={!tag=type}type%3Apage%20OR%20type%3Afoo

This will produce a list of documents that include pages and foos, and will list other types not selected and their counts:

[X] page (15)
[X] foo (8)
[ ] story (11)
[ ] forum (3)

Note that this doesn't work with q.alt.

#1

jp.stacey - August 18, 2009 - 10:27

This looks good if the syntax can be reliably decomposed/recomposed. Checkboxes / multiple selects would be a standard way of picking several inclusive items. CGI would normally encode them as separate instances of the parameter but I appreciate that fq= has to encode Lucene-syntax queries. That does mean you'd need an intermediate stage, where the checkboxes sent you to a URL of the format:

http://...?type=page&type=foo...

and that would then have to redirect to the proper Lucene syntax. Either that or get the ApacheSolr module to handle conversions between CGI and Lucene syntax and keep CGI-friendly URLs? I suppose it's a moot point if the initial search submission is HTTP POST: you'll have to redirect somewhere, as with standard non-Solr search.

As tweeted, we've done some work on OR queries, but we never exposed it to the user: it was for making complex initial-conditions facets for particular site searches (publications searches etc.) Alongside checkboxes and multiple selects you could maybe model using the widget Django uses for transferring users between groups: it's like two multiple selects with arrowed buttons in between, so you can pass terms back and forth. That's pretty cumbersome on the page but it does lessen the impact of accidentally not holding down CTRL during multiple select.

I'm trying to think if there's a non-form equivalent way of doing this. Radio buttons (AND searches) have their parallels in standard web links, which ApacheSolr AND facets already use to great effect to drill down. Is there any non-form parallel to the multiple non-exclusive select? Maybe Flickr-like edit-in-place, where (at least after you've made your initial multiple selection) you just have a set of items with Xs beside them to delete, and clicking on the link body turns into a set of checkboxes.

#2

robertDouglass - August 18, 2009 - 11:20

So, like you said, there are two ways to go. Either try and encode a fair amount of solr syntax in the url ({!tag-type} for example), or have an intermediate step that parses to the right syntax. The other option would be to have application state logic (like a block configuration) that modifies the query. I don't like that option because it means one query could behave differently under different circumstances depending on configuration. So my current favorite is to prefix the filter with an underscore _ if it is to be an OR facet. This would make URLs like this:
http://localhost/drupal-6.13/search/apachesolr_search/?filters=tid%3A113%20_type%3Astory%20tid%3A335%20_type%3Apage
Not horrible looking, and it doesn't conflict with Lucene syntax. http://lucene.apache.org/java/2_3_2/queryparsersyntax.html

#3

robertDouglass - September 13, 2009 - 21:01
Status:active» needs review

Please test! This is a big patch. I only tested on the apachesolr_module facets. Please help test on other modules' facets.

You set the operator for the facet on the block configuration page.

AttachmentSize
or_facets.patch 12.86 KB

#4

robertDouglass - September 15, 2009 - 11:23

Added a trim on the $queryvalues['filters'] = trim($queryvalues['filters']);

AttachmentSize
or_facets.patch 12.98 KB

#5

robertDouglass - September 15, 2009 - 13:27

CCK facet field block deltas are not the same as their Solr index field names.

AttachmentSize
or_facets.patch 13.24 KB

#6

robertDouglass - September 15, 2009 - 13:29
Status:needs review» fixed

#7

robertDouglass - September 17, 2009 - 11:33
Status:fixed» active

The numbers next to the facets in OR blocks are not accurate, and there is a question around what they should be. In AND filters the numbers show you how many documents would be in the result set if you click the link. To be consistent the OR facets would then have to show the same - how many results will be in the result set. I'm not sure how easy/hard this will be to calculate, but it will involve arithmetic on the currently selected facets within the same filter to find a delta to add to the current document set.

#8

socki - November 6, 2009 - 17:54

One issue with the current OR facet search is that it appears to function fine for CCK fields, but not taxonomy fields. The reason seems to related to this section of code in the apachesolr_modify_query function around line 1236.

      if (in_array($delta, $ors) || in_array($cck_delta, $ors)) {

In the case of taxonomy terms, the $ors array contain information on the individual vocabularies, but $delta is 'tid'. As a quick 'hack' I had change the line above to be:

      if ($delta == "tid" || in_array($delta, $ors) || in_array($cck_delta, $ors)) {

which worked, but is probably not the ideal solution.

 
 

Drupal is a registered trademark of Dries Buytaert.