Other GET parameters ignored by Apache Solr Facet Blocks

robertDouglass - June 26, 2009 - 12:47
Project:Apache Solr Search Integration
Version:6.x-1.x-dev
Component:Code
Category:bug report
Priority:normal
Assigned:robertDouglass
Status:closed
Description

The facet links don't have any awareness of other GET parameters that might be in the URL other than the ones that AS puts there itself. In fact, anything in the query string should be preserved so that other modules can use it too.

AttachmentSize
apachesolr_get_params.patch1.7 KB

#1

pwolanin - June 27, 2009 - 20:49
Status:needs review» needs work

We worked hard to remove all references to $_GET from the class. Seems like it would make more sense to pass in $_GET to the constructor.

Also, drupal_query_string_encode() has a serious bug in that it does:

$key = drupal_urlencode($key);

I'll have to find the issue link.

#2

JacobSingh - June 28, 2009 - 06:12

I don't love the class taking in $_GET... I suppose it is okay, but for DX, classes that take anonymous arrays of "stuff" are usually deadly. Seems to me that parsing $_GET should be outside of the class, and perhaps making sense of some random variables could be in a class factory. But the constructor should, if possible, be clean of that kind of logic and ambiguity.

If we just shove $_GET in as a property of the class along with other well defined ones in the constructor, I'm okay with that, as long as the $_GET munging doesn't happen there.

#3

robertDouglass - June 29, 2009 - 10:16
Status:needs work» needs review

Well, this approach here is the most flexible, and keeps GET out of the class, but is crappy for DX. Please comment separately on the issue of the function signature, when/where/if we deal directly with GET, and on the static helper function that this introduces to work around the bug Peter mentioned.

AttachmentSize
solr_query.patch 1.88 KB

#4

robertDouglass - June 29, 2009 - 10:26

I thought about putting the static function in the apachesolr.module but decided against it. Even though the concept of static functions in Drupal is somewhat stupid, it's nice that our class is almost completely free of dependencies on the module, so I elected to keep it a static function to maintain that.

#5

robertDouglass - June 29, 2009 - 10:36

This one munges $_GET into a private static variable with a pattern reminiscent of static caching (using if empty()). It then automatically includes that array in the query building. Since there is an incoming $excludes array the developer can override any of this behavior.

AttachmentSize
solr_query.patch 2.2 KB

#6

robertDouglass - June 29, 2009 - 11:44

my original version filtered $_GET by check_plain. Peter, do you think this is necessary?

AttachmentSize
solr_query.patch 2.21 KB

#7

pwolanin - June 29, 2009 - 12:38

It's not clear to me whether we need to worry about it, but maybe we shoudl based on this function: http://api.drupal.org/api/function/filter_xss_bad_protocol/6

Actually, if we are doing it we shoudl probably do it when we output the querystring, rather than on input?

#8

robertDouglass - June 29, 2009 - 13:12

I think (again for DX) doing it once, and doing it in the class when the variable gets set, is the best option.

I'm seeing a possibly related issue (need to role back the patch to confirm); my URLs are coming back with spaces, not plus signs (+), and this is breaking the unclick links.

#9

pwolanin - June 29, 2009 - 13:20

@Robert - we are using rawurlencode() now which shoudl encode spaces as %20, not +.

Filter on output is the Drupal way - I think we shoudl stick to that pattern.

#10

pwolanin - June 29, 2009 - 23:47

@Robert - why would we use a static class variable for this? Seems to defealt the purpose of allowing overrides.

#11

Scott Reynolds - June 30, 2009 - 00:56
Status:needs review» needs work

this patch makes changes that should also be reflected in the Interface.

I also don't understand the need for this change. Scares me a lot, I really like the class to know nothing but the stuff that it did makes it easy to test, easy to understand. What is the intended bug to be fixed or feature to be added?

#12

pwolanin - June 30, 2009 - 01:17

Here's a start on a patch that's a more general cleanup. It's totally untested, but just so you can see what I'm thinking.

@Scott - I think the idea is that there might be extra GET params you care about, especially for custom search pages. I'm not sure what Robert's particualr motivation is in this case.

AttachmentSize
gettyy-up-502976-12.patch 7.75 KB

#13

robertDouglass - June 30, 2009 - 07:47

Peter, that looks like it will meet my needs. I'll give it a test. I might need to simplify the patch some and have it not refactor so much all at once.

Scott, it's exactly like Peter said. If you have a module (say the print module) that sticks extra GET params onto URLs for whatever reasons, you need the $query to be aware of them or they just disappear in facet links, etc.

I was trying to find a pattern that didn't make you remember to send $_GET into the search_execute function, but if everyone else is more comfortable with Peter's pattern I'll adjust.

#14

robertDouglass - June 30, 2009 - 10:50
Status:needs work» needs review

This patch updates the new method to be called set_extra_queryvalues since we're storing an array. It fixes the querystring building logic in get_url_querystring, adds, some PHP docs.

AttachmentSize
get_params.patch 13.56 KB

#15

robertDouglass - June 30, 2009 - 11:06

This one is the same as the last rerolled after I cleaned up whitespace in the module. Less noise in the patch.

AttachmentSize
get_params.patch 8.2 KB

#16

robertDouglass - June 30, 2009 - 16:15

This is a URL that I'm using in a custom search built on ApacheSolr: http://localhost/alz/s?xor=all&s=1&show=200

If I want to show facet links that retain the xor, s and show GET parameters, we need a mechanism like the one proposed. With the patch in #15 a typical facet link looks like this:
http://localhost/alz/s/?xor=all&s=1&show=200&filters=type%3Abook

Without #15 the same facet link looks like this:
http://localhost/alz/s/?filters=type%3Abook

#17

Scott Reynolds - June 30, 2009 - 16:46
Title:Other GET parameters ignored by ApacheSolr» Other GET parameters ignored by Apache Solr Facet Blocks
Status:needs review» needs work

Ok thanks for the explanation. Changing title to reflect the problem better and providing a demonstration patch. Unfort, I am stuck on another project, which has a solr version that is lower and I really don't have time to spend. But as this affects the Views implementation, and I feel pretty strongly about this patch, I created a mere demonstration patch. It does have bugs in it, namely it will repeat variables in some cases. But I believe the code is placed in the proper spot so it affects how we create links and doesn't affect how we create Query objects.

Probably doesn't apply to the latest dev, again don't have time, brain cycles to spare.

Works in ur example though just as you drive deeper repeats GETs.

Tested against Apache Solr Views.

AttachmentSize
query.patch 746 bytes

#18

pwolanin - June 30, 2009 - 18:20

Sure, we could roll something liek that - but I'm not sure how this avoid duplicating params like solrsort, and of replicating aprams you don't want like 'page'
$diff = array_diff_assoc($_GET, $options);

#19

pwolanin - June 30, 2009 - 18:30

We should be able to use this helper function I think
http://www.php.net/manual/en/function.http-build-query.php

#20

pwolanin - June 30, 2009 - 19:08

@Scott - the downside of doing it the way you suggest, is that there is no easy OO way to override the behavior. Not sure what the right trade-off is.

#21

pwolanin - June 30, 2009 - 19:24

Actually - thinking more about it - this also would potentially alleviate the $_GET hack I just committed in the taxonomy browsing pages.

#22

Scott Reynolds - June 30, 2009 - 19:43

is that there is no easy OO way to override the behavior

this is the point. why would the query object every want to override the behaviour. What would it do differently about things that it knows nothing about?

And as Robert stated in the opening of the ticket

The facet links don't have any awareness of other GET parameters that might be in the URL other than the ones that AS puts there itself

Its the facet links, not the object that should concern itself here.

#23

pwolanin - June 30, 2009 - 19:44

This is more like what Scott suggests - but I see again that it's problematic in as much as we will fail to retain any GET params when we submit the search form.

AttachmentSize
example-l-502976-22.patch 7.8 KB

#24

pwolanin - June 30, 2009 - 19:49

@Scott - the query object is providing you with a querystring (or possibly an array of query values). How it generates that does seem to be something you might want to override - e.g. if you want your url to look like:

?facets=tid:99&mysort=created desc

or like project module does, to add keys to a 'text' GET param

#25

pwolanin - June 30, 2009 - 20:13

This patch fixes the sort link for score and the retain filters behavior - though note that misc GET params are not retained except when we specify retain-filters.

I think this might be a bit of a flaw in this architecture.

AttachmentSize
gettyy-up-502976-25.patch 10.1 KB

#26

Scott Reynolds - June 30, 2009 - 21:24

the query object is providing you with a querystring (or possibly an array of query values). How it generates that does seem to be something you might want to override

Right but those are parameters it knows about. things like xor or print it doesn't know about. By making the class retain these GETs we have the class maintain things it doesn't know how to handle and thats a lil scary. The class itself should be a self contained object. How do ever generate a link that doesn't have those GETs in it now. At least with my approach you could use the object to generate the link.

Which is why I was searching for real tangible reason why this needs to happen. So we couldn't just say "Look project module does it" or look "Solr Views does it as well". An tangible example of where the current architecture fails, where one module fails to function in the optimum because this change isn't present. The Views object adds the exposed filters' identifiers to the query string, those can be arbitrary.

we will fail to retain any GET params when we submit the search form.

Doh! probably need to add the additional GET params as hidden values in the form. Same problem would occur in the Views object.

But in the end, I can make the Views object handle this, it already stores all the information it needs. It is small and trivial I guess.

#27

robertDouglass - June 30, 2009 - 21:45

Doh! probably need to add the additional GET params as hidden values in the form. Same problem would occur in the Views object.

This might be the best solution.

By the way, while I'm thinking about it, I just have to say that I hate the %menu_tail aspect of search.module and how we currently do things with keywords. They're nothing but a GET param but we treat them differently and there are assumptions about them throughout the module. In a next iteration I want us to consider just having ?key=foobar for the keywords. It would simplify a number of things.

#28

robertDouglass - July 1, 2009 - 09:45

Peter, maybe we can move fixing the sorting to a different issue?

#29

pwolanin - July 1, 2009 - 13:05

@Robert - sure, they were jsut touching some of the same code.

After thinking abotu this more I've wavering back towards Scott's approach.

To do that well, however, I think we want to return an array of query params so that we can more easily filter by keys.

#30

robertDouglass - July 1, 2009 - 13:14

I'm closing in on a patch with scotts approach now. Will post shortly.

#31

robertDouglass - July 1, 2009 - 14:19
Status:needs work» needs review

Based on Scott's approach. One question I have is whether more unsets are needed here:

<?php
// Retain GET parameters that ApacheSolr knows nothing about.
$diff = array_merge($_GET, $options['query']);
+  unset(
$diff['q']);
+  unset(
$diff['filters']);
+  foreach (
$diff as $key => $value) {
+   
$options['query'][$key] = $value;
+  }
?>

AttachmentSize
get_params.patch 8.68 KB

#32

robertDouglass - July 1, 2009 - 14:24

Oops, that version still handled encoding itself. This one lets it pass through to url().

AttachmentSize
get_params.patch 8.35 KB

#33

Scott Reynolds - July 1, 2009 - 16:25
Status:needs review» needs work

I think that if you flip the array_merge() you solve your problem

<?php
array_merge
($options['query'], $_GET);
?>

I know for a fact that this won't work with Solr Views because it doesn't use $_GET['filters']. So with the array_merge flip described above, that should work with Solr Views, and I think fix the need to unset filters.

I applied the patch and try to test on a site Im building, and got a preg_match() error from drupal_validate_utf8 when rendering the hidden apachesolr_search_querystring form element. (Taking a guess at the form element, by looking at the backtrace).

#34

pwolanin - July 1, 2009 - 19:31

Robert - why unset when use can use array_diff_key()?

Also - for example - when we are handling a form submission we don't want to carry over any filter param from $_GET since there may be no such key in the destination query values.

#35

pwolanin - July 2, 2009 - 06:36
Status:needs work» needs review

ussing array_diff_key() and retaining GET params on form submission too.

AttachmentSize
gettyy-up-502976-35.patch 10.35 KB

#36

JacobSingh - July 2, 2009 - 07:58

I read the patch, and although I haven't been involved in the issue, I think I get the gist. Turning the query into an array instead of a string seems to have not introduced any regressions.

Would be nice to see it go through our QA cycle, but AFAICT, no issues. I am not testing it with Views or anything like that.

-J

#37

robertDouglass - July 2, 2009 - 08:51
Status:needs review» fixed

Great. Thanks for the help, guys. Committed #35.

#38

mkalkbrenner - July 6, 2009 - 10:53

#39

Scott Reynolds - July 7, 2009 - 16:13

Thought I would point out as you guys rev up to end the 1.x branch, that this fix does not work with Solr Views. #508826: Currently Broken with latest apachesolr dev

And the reason is

<?php
$get
= array_diff_key($_GET, array('q' => 1, 'page' => 1, 'filters' => 1, 'solrsort' => 1, 'retain-filters' => 1));
?>

That happens to be very hardcoded for apachesolr_search.module.

I pointed this out here:
1.) http://drupal.org/node/502976#comment-1764658
2.) http://drupal.org/node/502976#comment-1761590

I will fix this, but I busy busy busy. Hopefully, tonight before I pass out.

#40

pwolanin - July 8, 2009 - 13:15

@Scott - sorry if I misunderstood your prior comments - you seemed to be arguing to not include this query parameter handling in the object.

Where do you suggest defining this filter list if it's going to be variable?

#41

System Message - July 22, 2009 - 13:20
Status:fixed» closed

Automatically closed -- issue fixed for 2 weeks with no activity.

#42

Scott Reynolds - July 24, 2009 - 06:07
Status:closed» needs review

This is now usable by Apache Solr Views. It makes no assumptions about what $_GET params are in the url

For instance, Apache Solr Views might have the search keys as $_GET['my_text'] and have node types exposed via a form like this $_GET['node_types']. And this will work by leveraging the Interface and the abstraction it provides

AttachmentSize
get.patch 1.22 KB

#43

robertDouglass - July 24, 2009 - 06:28

This line is to be ignored, right?

/**
  * Mark one node as needing re-indexing.
- */
+$get = array_diff_key */

#44

Scott Reynolds - July 24, 2009 - 07:22

I swear I look at the patch through less... sigh. Reroll

I have fixed Apache Solr Views to now work :-D. This patch carries the params through with standard solr and the Views query.

AttachmentSize
get.patch 1.82 KB

#45

Scott Reynolds - July 24, 2009 - 17:08

....

the patch is just supposed to change one line

-  $get = array_diff_key($_GET, array('q' => 1, 'page' => 1, 'filters' => 1, 'solrsort' => 1, 'retain-filters' => 1));
+  $get = array_diff_key($_GET, array('q' => 1, 'page' => 1), $options['query'], apachesolr_current_query()->get_url_queryvalues());

This would be why I stop committing code after 10pm. Brain refuses to function....

#46

Scott Reynolds - August 14, 2009 - 15:56

Bumping this as this functionality currently borken with Apache Solr Views. Just a one line change.

#47

robertDouglass - August 14, 2009 - 16:14
Status:needs review» patch (to be ported)

Committing this patch to 6.2.

AttachmentSize
getparams.patch 1003 bytes

#48

robertDouglass - August 14, 2009 - 16:26
Status:patch (to be ported)» fixed

And to DRUPAL-6--1

#49

System Message - August 28, 2009 - 16:30
Status:fixed» closed

Automatically closed -- issue fixed for 2 weeks with no activity.

 
 

Drupal is a registered trademark of Dries Buytaert.