Reflect hierarchical taxonomy vocabulary in facet

mkalkbrenner - March 13, 2009 - 19:32
Project:Apache Solr Search Integration
Version:6.x-1.x-dev
Component:User interface
Category:feature request
Priority:normal
Assigned:janusman
Status:needs review
Description

I wrote a patch that allows to reflect a hierarchical taxonomy vocabulary in a facet.

It adds a checkbox to the facet's corresponding block configuration. Using this checkbox you can decide if a facet flattens a hierarchical taxonomy vocabulary like it's currently implemented in beta5 or if the facet reflects the hierarchy.

If you choose to reflect the hierarchy you'll see only top level terms in the first step. If you click on such a top level term, the search result gets filtered and the child terms of that top level term having results will appear.

Note:
While working on this patch I ran into an existing bug in beta5 which was already posted here: #400882: Apache Solr does not always filter down
I fixed that one and decided to attach the fix to that bug report. So before trying this new feature you have to apply my patch that fixes that bug first.

AttachmentSize
apachesolr.hierarchical_facet.patch9.18 KB

#1

janusman - March 13, 2009 - 23:53

Very Nice!!!

Attached screenshot comparing before/after. SO much better =)

Two things:
* Could you please submit a patch against DRUPAL-6--1 (dev)
* It seems somehow the character you tried to use inside theme_apachesolr_facet_item_indent() didn't get through (My editor in UTF mode shows it as just "?"). Could you try another character or sending the patch as UTF?

function theme_apachesolr_facet_item_indent($indent = 0) {
  if ($indent > 0) {
    return str_repeat('  ', $indent) .' » '; // I replaced the ? char with »
  }
}

AttachmentSize
hierarchic_terms_facets.jpg 21.23 KB

#2

janusman - March 13, 2009 - 23:53
Status:needs review» needs work

#3

mkalkbrenner - March 14, 2009 - 12:27
Version:6.x-1.0-beta5» 6.x-1.x-dev
Status:needs work» needs review

Thank you for the screenshot!

It seems that special UTF-8 characters break somewhere during the upload process. The char I used was "∟".
Nevertheless I included your theme function and recreated the patch against DRUPAL-6--1 (dev).
Due to the fact that indention is realized using a theme function everyone could decide for himself how indention should look like.

AttachmentSize
apachesolr.hierarchical_facet.patch 9.19 KB

#4

mkalkbrenner - March 14, 2009 - 12:36

Something went wrong again: ' » '

I choose ' > ' now to be on the safe side.

AttachmentSize
apachesolr.hierarchical_facet.patch 9.19 KB

#5

janusman - March 19, 2009 - 18:27

Confirm patch works. However, it added/removed some blank lines; re-rolled against latest DEV.

Also, maybe could we do better than >? =)

AttachmentSize
apachesolr-401234.patch 9.01 KB

#6

janusman - March 19, 2009 - 19:18

Just noticed that this needs to be attacked in the "Current search" block too, as hierarchy isn't being reflected.

E.g. after drilling down thru "Fruit" > "Apples" > "Red delicious", the current search block doesn't show this hierarchy, and then you can remove parent facets (e.g. "Fruit") and end up with children facets active (e.g. "Red Delicious"). This also causes the new hierarchic view to go nuts also. See attached screenshot.

AttachmentSize
hierarchic_terms_facets_problem.jpg 33.75 KB

#7

janusman - March 19, 2009 - 19:43
Status:needs review» needs work

#8

mkalkbrenner - March 20, 2009 - 14:50
Status:needs work» needs review

@janusman:

You're right. I adjusted the patch to reflect the hierarchy in current search block as well.
But I decided to not visualize the hierarchy in current search block. Only unlink links are constructed the right way now if you decide to turn on hierarchy on a facet block.

BTW I fixed a small sorting bug in hierarchical facet blocks as well.

AttachmentSize
apachesolr.hierarchical_facet.patch 10.11 KB

#9

jarchowk - March 29, 2009 - 14:25

There is a bug in this where if you delete a taxonomy term after the tid is indexed it bombs out the screen.

Also the $vid is hardcoded at substring pos of 7. Mine needs to be 12 for whatever reason (older version?), is it possible to not rely on substring?

#10

mkalkbrenner - March 29, 2009 - 22:51

@jarchowk:
I didn't introduce "substring pos of 7", I just moved the according line in my patch.
I'll have a look at the code to see if it's robustness against deletion of tids could be improved.

But from my point of view both points a general issues for apachesolr module and not related to this patch.

#11

poka_dan - April 2, 2009 - 18:02

+1 for this to make it in the next apachesolr release
I ran into the small sorting bug mkalkbrenner was talking about myself (http://drupal.org/node/412332); one more reason for it to make it in the next rel

#12

coupet - April 14, 2009 - 17:22

See attached example. The metadata conditions can be easily explored by opening and closing the subtrees without selecting them.

AttachmentSize
fig4.jpg 142.64 KB

#13

mkalkbrenner - April 14, 2009 - 18:15

@coupet:
Could you please explain your posting more detailed? Did you find a better solution for hierarchical taxonomy in facets? Or do you want to have this patch to work like the screenshot you attached?

#14

coupet - April 14, 2009 - 23:39

@ mkalkbrenner:

Suggesting for the patch to work as shown in screenshot.

#15

mkalkbrenner - April 16, 2009 - 13:19

@ #9 jarchowk:
I found a bug regarding removed tids. Could you please test again?

The patch should be applied against snapshot 6.x-1.x-dev 2009-Apr-16 now.

AttachmentSize
apachesolr.hierarchical_facet.patch 10.71 KB

#16

Glenmoore - May 11, 2009 - 16:29

Hi, could you tell me if this patch is likely to work against the 6.11 release?
If so, can you confirm that it was written relative to the Modules directory NOT the root directory?
Many thanks

#17

mkalkbrenner - May 12, 2009 - 08:29

@Glenmoore:

The patch works with drupal 6.11. And it should be applied against apachesolr module directory.

BTW the patch fails with current dev snapshot. I'll fix that soon.

#18

Glenmoore - May 12, 2009 - 14:44

@mkalkbrenner
Hi I am running Beta 5 so applied your 'clone' patch for this version which seemed to change the file as required (though I received no 'successful' or 'failed' messages-but the code looked like it had changed as expected). However, when I checked that my Apache search was still working I received the following WSOD message.

Fatal error: Class 'Solr_Base_Query' not found in C:\wamp\www\drupal2\sites\all\Modules\apachesolr\apachesolr.module on line 856

Now, I also tried to apply a patch yesterday to ApacheSolr which also appeared to be successful but resulted in a different WSOD so I may be doing something wrong. Let me tell you what I did and maybe you can spot the error.

I uploaded the patch file into Wordpad and saved as a text doc (as I am using Windows)
I placed the patch file inside the apachesolr file, navigated to the apachesolr file and typed "patch -p0 < Solr_Base_Query.php_.clone_.patch" in Cygwin.

Reversing the patch ("patch -p0 -R < Solr_Base_Query.php_.clone_.patch") does not get rid of the problem. I have to replace the Solr_Base_Query.php file completely before the problem goes away.

Sorry, wasnt sure if this should be posted on the thread which started with the patch in question (but decided to post it here as my primary objective is to apply the 'hierarchical' patch) apologies if it should.

#19

pwolanin - May 12, 2009 - 17:58
Status:needs review» needs work

according to above comments, patch doesn't apply.

Also, looks to me like there is too much being hard coded here - does this really need to be in the main module, or can an add-on module achieve this output?

#20

mkalkbrenner - May 12, 2009 - 18:43

Glenmoores comment above doesn't indicate that the patch could not be applied. The problem there is that he mixed an old version of apachesolr module (beta5) another patch and this patch.

Here's the patch to be applied against apachesolr 6.x-1.x-dev 2009-May-07.

I'm of the opinion that this feature should get into apachesolr directly because flattened hierarchical vocabularies behave really strange and unpredictable for the user. See #412332: order change for facet item list.
The implementation is comparable to the "Include missing facets" feature. It's not to much code but it rearranges a bigger part of hook_block of apachesolr_search.module. For Example it moves up

$vid = substr($delta, 7);
if (is_numeric($vid)) {

which is more efficient and fixes small issues like trying to load a vocabulary with a non numeric vid.

If I move this feature into a separate module I'll have to copy and modify a lot of code which isn't best especially it's hard to keep it in sync with apachesolr.

AttachmentSize
apachesolr.hierarchical_facet.patch 10.62 KB

#21

mkalkbrenner - May 12, 2009 - 18:35
Status:needs work» needs review

#22

janusman - May 12, 2009 - 20:20

It works. But...

I'm thinking we should really use theme_item_list() specifically defining children of other items... e.g.:

theme('item_list', array('data' => 'fruit', 'children' => array('apples', 'oranges')));

This should also then work with date facets, for instance.

However, this is IMO, I think the patch could go on as is for now until we get a more elegant solution? Opinions?

AttachmentSize
apachesolr.hierarchical_facet_6.patch 10.91 KB

#23

mkalkbrenner - May 15, 2009 - 08:58
Status:needs review» reviewed & tested by the community

"However, this is IMO, I think the patch could go on as is for now until we get a more elegant solution?"

I would highly appreciate that because it's a lot of work to keep this patch in sync with the ongoing changes within the apachesolr module.
If this patch will be accepted I'm willed to keep improving it.

#24

pwolanin - May 15, 2009 - 12:27
Status:reviewed & tested by the community» needs work

as above, this needs to be done in a more generic way versus hard-coding it to taxonomy. I'd rather see more serious if necessary to support such a feature.

#25

janusman - May 15, 2009 - 16:32
Status:needs work» needs review

Here's a patch that builds an array of children term elements to go into theme_item_list; this does away with the "indent" functions.

Attached screenshot of how it looks.

BTW looked around and couldn't find a better solution than using eval() to build the array programatically (which feels like a hack); suggestions welcome.

AttachmentSize
401234_25.patch 8.89 KB
401234_25.png 4 KB

#26

pwolanin - May 15, 2009 - 18:18

Here's a patch (complete I hope) to do some clean-up of the facet code/theme functions: http://drupal.org/node/463900#comment-1593394

That might help simplfy the changes needed here. If not, please suggest additional improvements.

@janusman - why is the eval() needed at all?

#27

janusman - May 15, 2009 - 18:59

@pwolanin: the eval() helps build the nested array of items to later pass on to theme_item_list(). It's a fast way to add items to existing portions of the array at any depth; I can't find (yet) how PHP could handle passing pointers to array elements--I don't think it can.

I just thought that eval() might not be secure in this case if a facet value includes an apostrophe. So that would have to be fixed.

Is it ok to keep this approach (just adding the above security check) and just reissue a patch when #463900: facet theme function clean-up is done? Or, what's missing?

#28

pwolanin - May 16, 2009 - 14:00

@janusman - you can certainly have references to array elements, and maybe I'm misunderstanding what you are trying to do. For a use of array elemnt references in core see: http://api.drupal.org/api/function/menu_tree_collect_node_links/6

#29

janusman - May 21, 2009 - 23:06
Status:needs review» needs work

I am realizing that, yes, *I am not sneaky enough* =)

Will write or adapt a recursive function to do the job without eval().

Good starting point:
http://kevin.vanzonneveld.net/techblog/article/convert_anything_to_tree_...

#30

janusman - May 29, 2009 - 14:06
Status:needs work» needs review

Ok, new patch, does not use eval().

AttachmentSize
401234_30.patch 9.69 KB

#31

mkalkbrenner - June 2, 2009 - 13:59
Status:needs review» needs work

I confirm that janusman's patch from #30 works and solves indention more elegant than my original version.

What's left to do now is to combine it with pwolanin's cleanup patch: #463900: facet theme function clean-up

#32

janusman - June 3, 2009 - 20:59

@mkalkbrenner: I'd say this is RTBC... have to take this one patch at a time =)

#33

mkalkbrenner - June 5, 2009 - 14:16

@janusman:
Your latest patch contains a bug. It breaks hidden item functionality because theme_apachesolr_facet_list() doesn't deal with the new children element.
Attached you find your patch with a small adjustment for theme_apachesolr_facet_list():

@@ -1372,8 +1383,8 @@
     drupal_add_js(drupal_get_path('module', 'apachesolr') . '/apachesolr.js');
     // Split items array into displayed and hidden.
     $hidden_items = array_splice($items, $display_limit);
-    foreach ($hidden_items as $link) {
-      $items[] = array('data' => $link, 'class' => 'apachesolr-hidden-facet');
+    foreach ($hidden_items as $hidden_item) {
+      $items[] = $hidden_item + array('class' => 'apachesolr-hidden-facet');
     }
   }
   $admin_link = '';

AttachmentSize
apachesolr.hierarchical_facet.patch 10.24 KB

#34

mkalkbrenner - June 5, 2009 - 14:16
Status:needs work» needs review

#35

pwolanin - June 6, 2009 - 02:13

looks liek there are some code style issues in the changed code in terms of whitespace.

Also, I'm not sure I understand why we need the static cache and what this function is really doing:

  function apachesolr_get_unclick_link(&$query, $field_name, $field_value, $vid = FALSE) {

I'ts taxonomy specific?

#36

mkalkbrenner - June 6, 2009 - 11:57

@pwolanin:
For non hierarchical facet items you simply remove current term's tid from the url to get an unclick link. But in case of hierarchical vocobularies unclick links become more complicated. If you unclick a parent item you have to remove tids of all children, too. I decided to put this extra logic into a separate function because unclick links are build at different places.

The reason for the static caching of unklick links is that the same unclick links are used in different places, for example in a facet block based on a hierarchical vocabulary and in the current search block .

BTW I even noticed that:

function apachesolr_get_unclick_link(&$query, $field_name, $field_value, $vid = FALSE) {
  ...
  $path = 'search/' . arg(1) . '/' . $new_query->get_query_basic();
  ...
}

I think that the way to create the path changed in CVS since the first version of this patch. I'll adjust that ...

#37

pwolanin - June 6, 2009 - 14:44

I feel like I am missing something here,
can you explain and maybe illustrate the
desired final behavior.

#38

mkalkbrenner - June 8, 2009 - 09:55

I fixed path handling and added some comments to the patch to explain unclick link creation.

AttachmentSize
apachesolr.hierarchical_facet.patch 11.12 KB

#39

pwolanin - June 8, 2009 - 12:37

I still would like a clear summary of the goal - which links are supposed to appear when, how are unclick links supposed to work, etc. I can't really review the patch without that.

#40

mkalkbrenner - June 8, 2009 - 21:37

hierarchical vocabulary 1:

1
2
  3
  4
    5
6

non hierarchical vocabulary 2:

7
8
9

Now some examples about unclick links.

current search:
tid:1 tid:7 type:story
unclick Links:

remove tid:1 => tid:7 type:story
remove tid:7 => tid:1 type:story
remove type:story => tid:1 tid:7

current search:
tid:2 tid:3 tid:7 type:story
unclick Links:

remove tid:2 => tid:7 type:story
remove tid:3 => tid:2 tid:7 type:story
remove tid:7 => tid:2 tid:7 type:story
remove type:story => tid:2 tid:3 tid:7

3 is a child of 2. So if 2 is removed from query, 3 must be removed, too.

current search:
tid:2 tid:3 tid:4 tid:5
unclick Links:

remove tid:2 =>
remove tid:3 => tid:2 tid:4 tid:5
remove tid:4 => tid:2 tid:3
remove tid:5 => tid:2 tid:3 tid:4

3,4 and 5 are children of 2. So if 2 is removed from query, all children must be removed, too.
5 is a child of 4. So if 4 is removed from query, 5 must be removed, too.
But note that 3 is a sibling of 4 and does not affect 5.

#41

mkalkbrenner - June 9, 2009 - 09:55

issue / patch from comment #33 doesn't work for some cck fields. Attached you find a patch containing a safer version:

-    foreach ($hidden_items as $link) {
-      $items[] = array('data' => $link, 'class' => 'apachesolr-hidden-facet');

+    foreach ($hidden_items as $hidden_item) {
+      if (!is_array($hidden_item)) {
+        $hidden_item = array('data' => $hidden_item);
+      }
+      $items[] = $hidden_item + array('class' => 'apachesolr-hidden-facet');

AttachmentSize
apachesolr.hierarchical_facet.patch 11.22 KB

#42

mkalkbrenner - June 19, 2009 - 20:44

I adjusted the patch to work with apachesolr development snapshot 6.x-1.x-dev 2009-Jun-19.
In other words I applied the api changes from #463900: facet theme function clean-up.

Please note that this patch won't work until issue #496650: Unclick links don't work in 6.x-1.x-dev 2009-Jun-19 is fixed (simply apply the patch available there first).

BTW hierarchical facets are very suitable for filters by forum topic.

AttachmentSize
apachesolr.hierarchical_facet.patch 11.71 KB

#43

Onopoc - June 20, 2009 - 00:07

Great feature. Subcribing.

#44

robertDouglass - June 20, 2009 - 12:28

I don't want to derail this very nice feature but I do want people to join me in thinking about ways to make facet block display pluggable (as well as the logic behind it all). for example, I've come up with two alternate ways to display facets (tag clouds and bar graphs) that will sit nicely alongside this hierarchical display and the normal facets. It would be nice if we could generalize how the admin chooses displays and how the system handles them. Investigating this patch might be a source for inspiration to this end.

Nice work so far here. My read of the patch didn't uncover anything obvious, so I need to test it thoroughly before commenting further.

#45

pwolanin - June 20, 2009 - 15:03

I'm am unhappy with the use of this function for all facets:

+function apachesolr_search_get_unclick_link(&$query, $facet_text, $field_name, $field_value, $vid = FALSE) {

It seems to add unneeded overhead and complexity for the common case.

As Robert suggests above, we possibly need a way to add information for each facet block about a display handler. So, Basically the functions for display hierarchical vocab facets would substitute for the current ones.

This is starting to sounds a bit like a views block display handler, at least in concept.

For the short term, possibly we could add a link next to each enabled facet to take you to chooser?

#46

janusman - June 20, 2009 - 17:57

@mkalkbrenner: great you're looking into this, will test it soon!

Now, re: #44, #45...

Thinking in this more general framework, IMO this patch is about improving the current "facet block display handler" which turns out to be the one for "taxonomy".

I agree 100% with trying to build a flexible framework for facets (display handlers, hooks, etc); and yes, it starts to sound a bit like views =) In fact at DrupalCon DC I suggested to @afummy that maybe, ApacheSolr could just be the backend for Views, and Views UI could behave like faceted search (where all options shown to "filter down" indeed produce >0 search results)... just an idea =)

However, IMO @robertDouglass's #44 and @pwolanin's (and my) agreement on it is farther down the road; if not we will get this module to final 1.0 in another 12 months =)

Can I convince you to move this issue ahead and open a new one to look into more into the facet framework (?) for Apache Solr?

#47

robertDouglass - June 20, 2009 - 18:37

Like I said, I wasn't trying to derail this feature.

@janusman - apachesolr_views module makes Solr the backend for views. It works, pretty much like you describe.

#48

Onopoc - June 21, 2009 - 01:39

I'm want to contribute testing for the patch in comment #42. But the patch didn't work. Is it normal?

First I apply unclick.patch. Terminal returns

patch: **** Only garbage was found in the patch input.

Then I apply apachesolr.hierarchical_facet_10.patch. Terminal returns

patching file apachesolr.module
Hunk #3 succeeded at 1405 (offset -1 lines).
patching file apachesolr_search.module

Then when trying a Solr search at mywebsite.com/search/apachesolr_search/ Drupal returns the below error. It was working fine before the patch.

Fatal error: Call to undefined function apachesolr_search_localize_taxonomy_term() in /home/mydomainname/public_html/sites/all/modules/apachesolr/apachesolr_search.module on line 602

Things I tried: Rebuilding Solr index, run cron.php then wait 5 minutes before testing again.

I'm using Ubuntu Server 8.04.2 via SSH. Patches applied against current head apachesolr-6.x-1.x-dev.tar.gz 2009-Jun-19.

#49

Onopoc - June 21, 2009 - 04:09

I'm assuming that this patch indent the children facets. It's a great feature. It improves usability.

I suggest another improvement to the current patch. Adding an unlimited facet depth instead of the current limited depth. Find attached screenshot to clarify. The unlimited depth is use with the Faceted Search module. You can try it at: http://facetedsearch.davidlesieur.com
This demo uses unlimited facet depth so you'll still see only top level facet in the first step. If you click on it, the search result gets filtered and the child facet of that top level term having results will appear. Clicking the 'all' link brings you back to the top level facet. Notice the use of two colours: Grey for not clickable items such as arrows and (number) and red for the facets. Using two colours make the block easier to read and looks better.

The issue with patch #42 is that if the Solr block is 200 pixels width fixed. The depth is limited to 3 or 4 facets maximum. With 5 facets depth or more the fixed block might not be wide enough and the facets will start distorting the block and the website design/layout.

With suggested unlimited facet depth if the Solr block is 200 pixels width fixed the depth is unlimited. When the facet reach the right side of the block it simply continue on the next line and so on. Unlimited. Works for both fixed and expendable/fluid design. Find attached mockup to clarify. This is how the suggestion works.

AttachmentSize
proposed-presentation-unlimited-4.png 204.04 KB

#50

mkalkbrenner - June 21, 2009 - 08:39

@#48 Onopoc:
Sorry, the error you ran into was a remaining change from #436578: Add support for translated (localized) taxonomy facet blocks and #463886: localize apachesolr I'm working on. I attached a corrected patch against apachesolr development snapshot 6.x-1.x-dev 2009-Jun-20 which already includes #496650: Unclick links don't work in 6.x-1.x-dev 2009-Jun-19. Please try again.

@robertDouglass, janusman, pwolanin:
Introducing facet handlers or a plug in infrastructure sounds good to me. But I second janusman and suggest to apply this patch to CVS now and start working on the api changes afterwards.
For me it's pretty hard to keep this patch in sync with all the changes happening to apachesolr for three and a half months now.
We already use apachesolr in production right now and I'm forced to keep this patch in sync because normal taxonomy facets are unusable for our hierarchical vocabularies.
I'm willed to keep improving this feature in future (p.e. #12, #45, #49 or "missing facets" which are marked as TODO in cvs), but it will be much easier and cost less time if a first version is in CVS.

AttachmentSize
apachesolr.hierarchical_facet.patch 11.67 KB

#51

pwolanin - June 21, 2009 - 13:46
Status:needs review» needs work

I'll try to review in depth today. I see some obvious problems though, mostly style/readability. Like:

$link = apachesolr_search_get_unclick_link($query, $term->name, 'tid', $tid, $reflect_hierarchy['apachesolr_search'][$delta] ? $vid : FALSE);

apachesolr_search_get_unclick_link is only really operative for hierarchical taxonomy, right? The use of the ternary operator in the funciton call is hard to follow, and really it doesn't make sense to me to ever call this new function unless we are representing a hierarchy. So, I'd rather see code more like:

if ($reflect_hierarchy['apachesolr_search'][$delta]) {
  $link = apachesolr_search_get_hierarchical_unclick_link($query, $term->name, $tid, $vid);
}
else {
  $link = theme('apachesolr_unclick_link', $term->name, $new_query->get_path(), $options);
}

The current special case of $vid and 'tid' could then just be the code that always runs in apachesolr_search_get_hierarchical_unclick_link().

#52

mkalkbrenner - June 21, 2009 - 14:46

@#51 pwolanin:

I'm fine if you decide to modify the code this way.

My intention was to move redundant code to that function:

<?php
$new_query
= clone $query;
$new_query->remove_filter($field_name, $field_value);
...
$options['query'] = $new_query->get_url_querystring();
$unclick_links[$field_name][$field_value] = theme('apachesolr_unclick_link', $facet_text, $new_query->get_path(), $options);
?>

Additionally every unclick was build two times (including query cloning and sometimes taxonomy tree) if current search block is activated. That's why I introduced the cache within this function in general.

Another approach might be to keep this function and the cache but hand over an array containing an already build list of filed_name and field_value pairs to be removed from current query. Than we have a common function that others might use, too.

Than your example will look somehow like this:

<?php
$to_be_removed
= array(array('name' => 'tid', 'value' => $tid));

if (
$reflect_hierarchy['apachesolr_search'][$delta]) {
 
$childs = taxonomy_get_tree($vid, $tid);
  foreach (
$childs as $child) {
   
$to_be_removed[] = array('name' => 'tid', 'value' => $child->tid));
  }
}

$link = apachesolr_search_get_unclick_link($query, $to_be_removed);
?>

#53

pwolanin - June 21, 2009 - 15:08

This second option of a more generic function looks more useful/re-useable to me.

In terms of performance, I doubt this 2x cloning has measureable effect on page load, and also this might lead to problem if one wants to do for example, a federated search (e.g. separate user and ndoe searches shown on the same page). In the absence of compelling evidence that this optimization is needed, I think it should be omitted.

In terms of actual performance, I'd bed the Frankenstein use of page_query() is more of an issue, since I'd think this issues a couple actual SQL queries which do essentially nothing.

#54

mkalkbrenner - June 22, 2009 - 17:08
Status:needs work» needs review

I adjusted the patch according to my new approach from #52.

Caching is still included but could be removed if you like to.

AttachmentSize
apachesolr.hierarchical_facet.patch 11.58 KB

#55

pwolanin - June 23, 2009 - 01:52

Why not just do:

$remove[] = $field;

instead of:

$remove = array(array($field['#name'], $field['#value']));

Then you can reference the named keys instead of 0, 1

#56

mkalkbrenner - June 23, 2009 - 10:01

For current search block you're right. But for im_vid_ blocks we would have to create those $field arrays. Same for date facet block if we migrate it.

My intention was to find an easy way to hand over a list of key value pairs where same key might be present multiple times and that could be easily serialized to generate a cache key no matter in which order these key value pairs are passed.

#57

mkalkbrenner - July 3, 2009 - 09:58

Adjusted patch to latest api changes introduced by #502976: Other GET parameters ignored by Apache Solr Facet Blocks.
Patch is against apachesolr 6.x-1.x-dev 2009-Jul-03.

AttachmentSize
apachesolr.hierarchical_facet.patch 11.57 KB

#58

pwolanin - July 3, 2009 - 16:49

I still think we should be using named keys vs. 0, 1

#59

mkalkbrenner - July 6, 2009 - 12:41

Here's a reworked version that uses '#name' and '#value' as named array keys instead of 0 and 1.

AttachmentSize
apachesolr.hierarchical_facet.patch 11.68 KB

#60

janusman - July 6, 2009 - 20:10
Status:needs review» needs work

Patch applies cleanly to fresh checkout.

It works, however, if the user is directly limiting by a deep-levelled term (which will happen if "Use Apache Solr for taxonomy links" in admin/settings/apachesolr/settings is set), then the "Filter by" facet block fails to show the lineage (it just shows the top-level parent as available).

See attached image: on the left is the result of normal search + filtering; on the right is what would happen if I would click on the term "Economics" from inside a node's taxonomy term links, or if a user somehow has a direct link to that term.

AttachmentSize
401234_60.png 40.05 KB

#61

mkalkbrenner - July 7, 2009 - 10:03

I think the issue janusman reported was introduced by #472600: Replace taxonomy/term/XXX node listings with Apache Solr search results.
I'll work on it, but only if I get pwolanin's commitment to accept the patch. Otherwise it's just a waste of time.

#62

robertDouglass - July 7, 2009 - 10:25

mkalkbrenner this issues has been slated to still go into the 6.1 branch - you won't be wasting your time.

#63

janusman - July 7, 2009 - 13:33

@mkalkbrenner: need help? =) Contact me or leave a note here.

#64

mkalkbrenner - July 7, 2009 - 22:38

@janusman:
Thanks for your offer. My current problem is, that we're busy with two projects going live this week. So if you have some free time it would be nice if you take this task. Otherwise I'll do it next week.

I see two approaches, which should be discussed here:
1. Specific: Apply some additional logic when creating taxonomy links to also add parent tids in case of hierarchical vocabularies.
2. General: Detect missing tids in case of hierarchical vocabularies on search result page and send a redirect with a corrected url.

Do you see another approach?

#65

janusman - July 7, 2009 - 23:13
Assigned to:mkalkbrenner» janusman

@mkalkbrenner: Those pesky projects, eh? =) Good luck on that.

I prefer (1). I think the code should always try to build the lineage every term if it hasn't already.

E.g. Fruit (tid:1) > Apples(tid:2) > Granny smith(tid:3)

Case 1) filters=tid:3
Build the lineage for all parent terms of tid:3

Case 2) filters=tid:1 tid:3
Build lineage for tid:1
Build lineage for tid:3 (add tid:2 as son of tid:1, add tid:3 as son of tid:3)

Case 3) filters=tid:1 tid:2 tid:3
Build lineage for tid:1
Build lineage for tid:2 (add tid:2 as son of tid:1)
Build lineage for tid:3 (add tid:3 as son of tid:3)

I'm thinking this might be a tad expensive query-wise if we don't do it correctly. Perhaps we might even come up with a novel solution to do it all just using Solr =)

I think I have some time thursday + friday for this, Ill look into it.

#66

coupet - July 8, 2009 - 23:56

A paper describing Castanet, an algorithm for automatically generating hierarchical faceted metadata.

Automating Creation of Hierarchical Faceted Metadata Structures
http://biotext.berkeley.edu/papers/castanet.pdf

#67

robertDouglass - July 17, 2009 - 10:29

I'm backing off my statement that this can still go into 6.1. I want to get that branch locked down without introducing more moving parts. Changing to 6.2.

coupet - the article was quite interesting, but not relevant to this issue.

As to comment #64, I also think approach #1 is correct. A child taxonomy link, at least in the context of http://drupal.org/files/issues/401234_60.png , should include filters for its visible parents.

#68

robertDouglass - July 17, 2009 - 10:30
Version:6.x-1.x-dev» 6.x-2.x-dev

#69

janusman - July 22, 2009 - 21:43
Status:needs work» needs review

New patch built upon the previous one by @mkalkbrenner et al...

Overview:

  • Adds an option to each Filter by Taxonomy block in admin/build/block/configure/[module]/[delta] to show that vocabulary as hierarchical or as flat.
  • Shows thus-configured facets as hierarchical, both in the "Filter by.." block, and in the "Current search" block.
  • Allows the keyword search to be removed (taken from #473554: Add an "unclick" link to search keys)

It also now fixes the problem seen in #60.

Code-wise:

  • it moves the "current search" block code from apachesolr_search_block() into its own function for legibility. This is basically the reason it includes the patch from #473554: Add an "unclick" link to search keys since that other patch
  • it adds a new public function to Solr_Base_Query.php : remove_keys().

My feeling is that the code could still be optimized a bit, but it seems stable and handles entry to a deep-nested term correctly.

And... a question:

Is there a use case for admins wanting to show hierarchical facets as flat/unrelated?

My first impulse is that it would be very unlikely...

In fact I think *not* showing terms hierarchically (IMO) can introduce usability hurdles into a site. Because of this, I'm asking here (@robertDouglass??) to reconsider having this "fix" go into 1.x.

AttachmentSize
apachesolr-401234-69.patch 16.33 KB

#70

mkalkbrenner - July 30, 2009 - 14:38

I tested janusman's patch from #69 against 6.x-1.x-dev 2009-Jul-30 in our environments and it works.
But I adjusted the patch a little.

I removed again two obsolete lines of code I previously removed in my patches:

--- apachesolr_search.module (revision 30471)
+++ apachesolr_search.module (working copy)
@@ -562,78 +562,99 @@
         // Get information needed by the taxonomy blocks about limits.
         $initial_limits = variable_get('apachesolr_facet_query_initial_limits', array());
         $limit_default = variable_get('apachesolr_facet_query_initial_limit_default', 10);
+        $reflect_hierarchy = variable_get('apachesolr_facet_reflect_hierarchy', array());

         // Handle taxonomy vocabulary facets
         if ((strpos($delta, 'im_vid_') === 0) && module_exists('taxonomy')) {

           if (is_object($response->facet_counts->facet_fields->$delta)) {
-            $contains_active = FALSE;
-            $terms = array();

Like mentioned in comments #9 and #15 I found another line that might cause a fatal error if you change a taxonomy's hierarchy without updating the index:

+                  foreach ($parents as $term) {
+                    $count = 0;
+                    if (property_exists($response->facet_counts->facet_fields->$delta, $term->tid)) {
+                      $count = $response->facet_counts->facet_fields->$delta->{$term->tid};
+                    }

Attached you'll find janusman's patch from #69 with these two adjustments.

AttachmentSize
apachesolr.hierarchical_facet.patch 15.53 KB

#71

dakala - August 5, 2009 - 17:17

Can't get this patch to work. Anyone willing to share a working patched module, please? Kindly contact me directly if you wish to help. Thanks.

#72

robertDouglass - August 13, 2009 - 11:58
Status:needs review» needs work

I think this issue might allow us to simplify the patch slightly? #545094: Method to set Keys on Query

<?php
$reflect_hierarchy = variable_get('apachesolr_facet_reflect_hierarchy', array());
$form['apachesolr_facet_reflect_hierarchy'] = array(
+   
'#type' => 'checkbox',
+   
'#title' => t('Links reflect hierarchy'),
+   
'#description' => t('Facet will be sorted hierarchically if it bases on a hierarchical taxonomy vocabulary.'),
+   
'#default_value' => !empty($reflect_hierarchy[$module][$delta]),
+  );
?>

Is there any reason why we don't do this automatically for all hierarchical taxonomy vocabularies? This would certainly be more user friendly, IMO.

Please, no more _functions. "Private" functions need to be class based or they're not private.

<?php
+function _apachesolr_search_get_unclick_query_key($res, $val) {
+function
_apachesolr_search_nested_facet_items($input, $parent = '', $level = 0) {
?>

I think the doxygen standard that we use is to put the type and variable on one line and the short description on the next line.

<?php
/**
+ * Creates a link to trigger a new solr search after removing at least one of previously
+ * selected search filters. Therefor you have to pass the filters' field names and values.
+ *
+ * @param $query object current apachesolr query
+ * @param $facet_text string
+ * @param $remove array array(array('#name' => 'field name', '#value' => 'field value'), array('#name' => 'field name', '#value' => 'field value'))
+ * @return string themed unclick link
+ */
?>

Why is $query passed by reference?

<?php
+function apachesolr_search_get_unclick_link(&$query, $facet_text, $remove) {
?>

Coding style $a . $b

<?php
+function _apachesolr_search_get_unclick_query_key($res, $val) {
+  return
$res.$val['#name'].$val['#value'];
+}
?>

Full doxygen comments please:

<?php
/**
+ * Return nested array of nested facet values for use with theme_item_list()
+ */
+function _apachesolr_search_nested_facet_items($input, $parent = '', $level = 0) {
?>

Also, it can read: "Return nested array of facet values" (redundant "nested")

Either give a type hint in the function signature or cast $input to an array so we never get warnings on the foreach.

<?php
/**
+ * Return nested array of nested facet values for use with theme_item_list()
+ */
+function _apachesolr_search_nested_facet_items($input, $parent = '', $level = 0) {
+  if (
$level == 0) {
+   
ksort($input);
+  }
$result = array();
+  foreach (
$input as $key => $item) {
?>

Full doxygen comments please:

<?php
+/**
+ * Return the contents of the "Current search" block.
+ */
+function apachesolr_search_currentsearch_block($response, $query) {
?>

Extra commented line?

<?php
#$links[] = apachesolr_l($query->get_query_basic(), $query->get_path(), array('attributes' => array('class' => 'active')));
?>

#73

janusman - August 13, 2009 - 20:41
Status:needs work» needs review

Implemented changes @robertDouglass mentioned in #72.

AttachmentSize
apachesolr-401234-73.patch 15.42 KB

#74

robertDouglass - August 14, 2009 - 16:05
Status:needs review» needs work

Patch doesn't apply to DRUPAL-6--2?

#75

robertDouglass - August 14, 2009 - 16:06

patching file apachesolr_search.module
Hunk #2 FAILED at 653.
Hunk #3 succeeded at 784 (offset 5 lines).
Hunk #4 succeeded at 834 (offset 5 lines).
Hunk #5 succeeded at 1203 (offset 5 lines).

#76

janusman - August 14, 2009 - 21:42
Status:needs work» needs review

!

Maybe 2.x changed before this patch =) Rerolled.

Forgot to mention that now there's no manual setting needed, the "show hierarchy" setting automatically applies to all vocabularies that are not freetagging nor multi-parent hierarchical (as determined by taxonomy_check_vocabulary_hierarchy()).

AttachmentSize
apachesolr-401234-76.patch 15.7 KB

#77

ruxkor - August 17, 2009 - 13:07

Has anybody tried to use this in combination with content taxonomy fields? I am currently having a hard time trying to implement a solution. :-/

#78

robertDouglass - August 17, 2009 - 17:21

what exactly do you mean "content taxonomy fields"?

#79

ruxkor - August 18, 2009 - 00:34

let me explain: by using the http://drupal.org/project/content_taxonomy content taxonomy module, it is possible to create cck fields using a taxonomy. the storage for those content taxonomy fields differs from the "normal" vocabularies if the setting "add to core taxonomy" is disabled (which is the case for many content taxonomy fields I am using for a project).

It is currently not possible to use those cck fields as a hierarchical facet since their string doesnt match to strpos($delta, 'im_vid_') === 0.

An apparently correct but at second thought wrong solution could be altering a document prior uploading it to solr by parsing all given cck fields, checking if they are from the type "taxonomy" and set the values in analogy to the core taxonomy fields (eg similar to apachesolr_add_taxonomy_to_document) BUT: That would eliminate the information about our cck field! Imagine two content taxonomy fields using the same vocabulary. By adding all content taxonomy fields to the tid, im_vid_# and vid values it would be still not possible to filter for a specific cck field, but only for a specific vocabulary.

The result is that content taxonomy fields need their own sets of multivalues, eg im_tid_MYCCKFIELD and im_vid_MYCCKFIELD_vid_# etc. But this means that almost the entire code has be adapted. :-/ After several hours of work I got some results but I am not really convinced by it; I used hooks whenever possible or added some new hooks and rebuilt a big part of this patch in my own search module (instead of directly adding things to the module) since patching would become even more terrible if I would directly modify the code :-(

Any ideas? Is there somebody else who ran into the same problem?

#80

robertDouglass - August 18, 2009 - 05:39

ruxkor: One solution might be to write a module that provides facet blocks for content_taxonomy. The core apachesolr modules are unlikely to support this directly, and as you pointed out, the important bits are rather straightforward string comparisons. Please open a different issue as it's not actually directly related to this one. Thanks.

#81

janusman - August 18, 2009 - 18:59

@ruxkor: I am currently using content_taxonomy fields in our production site; we *do* have "Additionally store in the core taxonomy system" option checked for all these fields so Apache Solr works out of the box with those fields.

If you want to go this way, then try looking at this issue: #485328: Provide a way to import and migrate from traditional taxonomy field ... we had to re-import our field values into the CCK tables during our migration to D6.

#82

janusman - August 24, 2009 - 17:36

Patch didn't apply anymore, rolled a new one.

AttachmentSize
apachesolr-401234-82.patch 15.49 KB

#83

janusman - September 1, 2009 - 19:53

Patch failed to apply, bad line in Hunk #2... new patch.

AttachmentSize
apachesolr-401234-83.patch 15.48 KB

#84

kristyb2008 - October 5, 2009 - 19:35

I've tried the most recent patch #83 and it doesn't work. Any new patches available?

#85

pwolanin - October 12, 2009 - 21:47

We should probably get this in 1.x first if we want to reach 1.0?

#86

mkalkbrenner - October 13, 2009 - 08:17

"We should probably get this in 1.x first if we want to reach 1.0?"

I really appreciate that. Like I mentioned earlier (custom) forum searches without that feature simply make no sense. And this is a must have for a 1.0 release. We're running apachesolr 1.0-rc1 on three web sites now using the patches from #70 or #73 and everyone is happy with that.

#87

janusman - October 13, 2009 - 15:07

This is frustrating; since the patch touches a lot of code and the module itself (the 2.x branch anyways) has been changing frequently, it has to be reviewed quickly or else it won't apply when a reviewer comes along. =)

Perhaps 6.x-1.x is a slower-moving target?

If I can get confirmation from someone willing to review this, I can re-roll the patch (just tell me which branch I should aim for!) =)

#88

pwolanin - October 14, 2009 - 19:59
Version:6.x-2.x-dev» 6.x-1.x-dev

I've been trying to find time to review - if you can re-roll for the 1.x branch then ping me about it.

We all agreed this would go in the 1.x branch, so let's try to get the code right there and then port it.

#89

janusman - October 21, 2009 - 16:40

New patch, this is for 6.x-1.x-dev.

AttachmentSize
apachesolr-401234-89.patch 16.17 KB

#90

pwolanin - October 21, 2009 - 17:57
Status:needs review» needs work

looks like there are unrelated changes in the patch to Solr_Base_Query.php

#91

janusman - October 21, 2009 - 19:01
Status:needs work» needs review

Correct. New patch.

AttachmentSize
apachesolr-401234-91.patch 14.83 KB

#92

pwolanin - October 22, 2009 - 02:31

Looks like this ought to be using constants defined by taxonomy module?

if ($voc->hierarchy != 2 && $voc->tags != 1) {

#93

janusman - October 22, 2009 - 15:01

Uhm... there are no defined constants in the D6 core taxonomy module... same in D7 =)

To prove it, here's a bit of code inside taxonomy_help()... the rest of the code also compares tags and hierarchy to ">0", "==2", etc.:

<?php
   
case 'admin/content/taxonomy/%':
     
$vocabulary = taxonomy_vocabulary_load($arg[3]);
      if (
$vocabulary->tags) {
        return
'<p>'. t('%capital_name is a free-tagging vocabulary. To change the name or description of a term, click the <em>edit</em> link next to the term.', array('%capital_name' => drupal_ucfirst($vocabulary->name))) .'</p>';
      }
      switch (
$vocabulary->hierarchy) {
        case
0:
          return
'<p>'. t('%capital_name is a flat vocabulary. You may organize the terms in the %name vocabulary by using the handles on the left side of the table. To change the name or description of a term, click the <em>edit</em> link next to the term.', array('%capital_name' => drupal_ucfirst($vocabulary->name), '%name' => $vocabulary->name)) .'</p>';
        case
1:
          return
'<p>'. t('%capital_name is a single hierarchy vocabulary. You may organize the terms in the %name vocabulary by using the handles on the left side of the table. To change the name or description of a term, click the <em>edit</em> link next to the term.', array('%capital_name' => drupal_ucfirst($vocabulary->name), '%name' => $vocabulary->name)) .'</p>';
        case
2:
          return
'<p>'. t('%capital_name is a multiple hierarchy vocabulary. To change the name or description of a term, click the <em>edit</em> link next to the term. Drag and drop of multiple hierarchies is not supported, but you can re-enable drag and drop support by editing each term to include only a single parent.', array('%capital_name' => drupal_ucfirst($vocabulary->name))) .'</p>';
      }
?>

#94

pwolanin - October 22, 2009 - 15:04

Wow, taxonomy module sucks...

#95

fabdel - October 30, 2009 - 13:25

Hi,

Suscribe for the hierachy of taxonomy in apachesolr.
Sorry but the hierarchy doesn't work.
I've always a flat taxonomy.
Drupal 6.14. I've apply the apachesolr-401234-91.patch.
Tested with apachesolr 6.1.x DEV and the the 6.x-1.0-rc3.

Really don't understand.

#96

mkalkbrenner - October 30, 2009 - 18:56

@fabdel:
If you like, you can contact me to get an already patched version of 6.x-1.0-rc3.

#97

pwolanin - November 13, 2009 - 16:45
Status:needs review» needs work

Trying out the last patch - I see the same issue as fabdel - the hierarchy doesn't show.

#98

pwolanin - November 13, 2009 - 16:58

oh, wait - this is sort of working - but very counter-intuitive to me. The initial search result only shows the top level parent that's matched - no necessarily the actual term in the search results.

#99

pwolanin - November 13, 2009 - 17:00

Patch needs work no matter what, since it will WSOD if taxonomy module is disabled:

Fatal error: Call to undefined function taxonomy_get_vocabularies()

#100

janusman - November 13, 2009 - 19:59

Wow.. this is comment #100...

IMO, we *want* to actually display just the top level terms, since there potentially could be too many child terms, they might never be shown. Also, it's in the spirit of "drilling down" to find what you need (which is what faceted search is good for... making large, complex data sets) =)

Say you have terms for the diferent animal kingdoms/Pylums/classes/etc inside a taxonomy. If you do a text search for "wings" it would get messy to display ALL the matching species's terms in an apachesolr block. What you want is to narrow down, starting from the top Phylum then drill down .. until you get to the insects or birds, etc.

This is the way a lot of faceted search interfaces work. If you're unconvinced I can look up a few cases.

I'll fix the patch to check for the taxonomy dependency found in #99.

#101

mkalkbrenner - November 13, 2009 - 20:01

The usage of function taxonomy_get_vocabularies() was introduced with patch from comment #73.

Like in function apachesolr_search_apachesolr_facets() it should be wrapped by if (module_exists('taxonomy')) { ... } somehow.

But the problem is not only limited to this patch. Function _apachesolr_field_name_map($field_name) also calls function taxonomy_get_vocabularies() without any checks.

#102

janusman - November 13, 2009 - 20:33

New patch, fixing the comment from #99 if taxonomy module was not enabled.

AttachmentSize
apachesolr-401234-101.patch 13.92 KB

#103

janusman - November 13, 2009 - 20:34
Status:needs work» needs review

Forgot to set as needs review.

#104

pwolanin - November 14, 2009 - 03:08

regarding displaying the hierarchy - we currently display all the parents of any term on a node, so the number problem already exists.

Perhaps what's bothering me is the UI - in this case we kind of need something like the menu block have to indicate that a term has children.

Also, is it the right behavior to always reflec tthe hierarchy, or are there use cases where search results shoudl display a hierarchy flat as we have now?

#105

mkalkbrenner - November 14, 2009 - 09:53

"Also, is it the right behavior to always reflect the hierarchy, or are there use cases where search results shoudl display a hierarchy flat as we have now?"

My original patch provided a check box on the settings form to switch between hierarchical and flat view per vocabulary. I think it was Robert's decision to remove that feature.

BTW if you want to see a live demo of this feature have a look at our public beta of http://www.coupons-proben.de/
The navigation on the left is nothing else than a facets. "Kategorien" are hierarchical.

 
 

Drupal is a registered trademark of Dries Buytaert.