Instead of those strange filters=field1:value%20field2:value&solrsort=stitle asc, we should use standard query strings like field1=value&field2=value&sort=title:asc.
Discussed that with pwolanin this night and he said:
I'm not claiming I love the current setup, but it works
but in any case, a lot of re-coding
| Comment | File | Size | Author |
|---|---|---|---|
| #29 | 370855-29.patch | 7.34 KB | nick_vh |
| #25 | 370855-25.patch | 4.78 KB | nick_vh |
| #24 | 370855-24.patch | 3.35 KB | nick_vh |
| #11 | 370855-standard-query-strings.patch | 12.97 KB | damien tournoud |
| #3 | 370855-standard-query-strings.patch | 12.96 KB | damien tournoud |
Comments
Comment #1
damien tournoud commentedHere is a first shot at this.
One issue on this is that apachesolr_current_query() uses the "fields" string as part of the key it uses for its static cache. This makes little sense to me, so I changed the signature of the function so that we now explicitly set the query instead of the current lazy loading mechanism.
Comment #2
damien tournoud commentedBy the way:
This patch tends to remove complexity in the module, not add some.
Comment #3
damien tournoud commentedOh, and we can remove filter_extract(), now ;)
Comment #4
pwolanin commentedThis is an interesting start, but I'm a little dubious
You basically pass in the entire $_GET as the filters. However, we have 'q' and 'solrsort' and pager elements as possible arrary keys.
More seriously, I don't see if/how you are handling multiple instance of the same field and maintaining the original order.
In the new version of:
public function get_url_querystring()
it seems to me that all filters on the same field will come out grouped together regardless of the input order. If you want to go down this router, you'll have to mimic Robert's method of finding the strpos() in $_SERVER['QUERY_STRING'].
In this case, it would be simpler to maintain all as mult-valued and not convert any back to scalar.
Comment #5
damien tournoud commentedYes, those are filtered out by the field detection mechanism.
True. All the different values of a fact will come up grouped together (this is really a limitation of the model where we delegate to PHP the task of parsing the query string into an array). Do we really want to keep the possibility of alternating facets? That seems overkill, and results in some very ugly code in the current implementation.
Other solution: express facets as part of the main URL:
search/apachesolr/<text>/field1=value1/field3=value3/field2=value2?sort=field3:descComment #6
pwolanin commentedthinking about this more - why not take this approach instead:
use a GET parameter like we have now, such as 'filters', but break it up so it's an array.
instead of:
you could have:
This would be way easier to keep a consistent order, I think, and would allow you to pull this out of $_GET very easily.
Comment #7
JacobSingh commentedI like peter's last suggestion best. In reality, we are passing multiple fq params to solr. Staying as close to that pattern as possible is to our advantage in terms of code simplicity.
I did a little survey of a few solr sites, in particular the commercial arm of solr:
http://www.lucidimagination.com/search/?q=filters#/p:droids,tika/s:email...
It's AJAX, so perhaps not the best example, but I could see us going in that direction too. It is certainly easier to read.
Volunteer DB:
http://www.volunteersolutions.org/volunteer/search-2?target=&keyword=foo...
Submits all filters.
Property listing:
http://www.enormo.com/search/loc/_USA%252FNew-York%252FNew-York%252FManh...
super long, but using the path. includes every possible filter. I assume it is using JS to build this array when you submit the form on the right.
Since all of our queries are ANDed together (ORs would go inside a single fq). It makes sense to me to do as peter is suggesting, or just simplify it and do something like he's suggesting, but a little nicer to look at:
q=keys&filters=(uid:2 OR uid:3);tid:5;created[000]
This isn't really any work to parse:
$params['fq'] = explode(';', $_GET['filters']);
Best,
Jacob
Comment #8
drunken monkeyIt's tempting to think so, but then a single semicolon in a search term could break the whole search. I think, the built-in functionality of building an array with "filters[]=..." would be a lot safer regarding odd characters, so I'd just go with the original suggestion. The cosmetic value of any alternative could be disputed anyways.
Comment #9
pwolanin commented@Jacob - that last suggestion is essentially already what we have - using spaces as the separator and various regex to match.
Comment #10
JacobSingh commentedI think the whole argument is about what is prettier / less code to maintain. I'm just trying to establish something which is easier to parse if that is the issue at hand. Personally, I think we should just find a delimiter that is more reliable than space (how about :/ or something)? We could even make it a variable in the variables table.
That means we have a 1 line parsing operation, and it is still readable. I think ultimately this is not a really valuable place to be throwing a ton of time, so I'll stop my 2bits, would like to avoid the mile long URL though if possible.
Best,
Jacob
Comment #11
damien tournoud commentedA new version, now ordering should be kept in every cases.
Comment #12
pwolanin commentedstill not sure I like this approach.
Also, looks like you have some issues with the patch - I don't think it's against the code in CVS.
Comment #13
robertdouglass commentedJust a note that we should revisit this when we open the 6--2-0 branch.
Comment #14
mkalkbrennerJust a comment from my point of view:
This change will break a lot of custom code in our existing projects. For me it's an API change that's too late because RC1 is already released. This issue should be postponed for the same reason as #494182: apachesolr and core search treat Drupal fields (Field API, CCK) differently.
Comment #15
robertdouglass commentedexactly why it must wait for the 6.2 branch.
Comment #16
pwolanin commentedEven then I have doubts.
Comment #17
robertdouglass commentedDamien, #11 looks interesting - if we can get less code, better query parameter names, and not break anything, I'd be happy. Now that 6.2 is open we can make changes like this.
Pwolanin please explain why you don't like the approach in #11. Thanks.
Comment #18
jpmckinney commentedComment #19
pwolanin commentedI still think filter parsing and a variety of code could be simplified or improved if we took in a array of filters in $_GET['f'] (or something similar) rather than one big string which we have to break apart. I think this could also give us a model for a more generic way to take in filters for D8 core (as part of the goal of having a facet API in core). I'm still not fond of trying to find fields in the query string - e.g. it means I can't filter on a field named "page" and also have a pager.
Comment #20
pwolanin commentedWe now use ?fq[]= and Facet APi uses ?f[]=
The remaining work here is potentially just to have a configuration for each search page as to whether added ?fq[]= params are used.
Comment #21
pwolanin commentedComment #22
nick_vhsince hook_apachesolr_page_alter is there now I assume this can happen in custom code?
Comment #23
nick_vhChanging title
Comment #24
nick_vhComment #25
nick_vhSome improvements
Comment #26
nick_vhComment #27
pwolanin commentedlooks the validation is applied to the wrong fields - see #1313698: Support for search of multiword content in facets/fields
#28
Posted by Nick_vh on December 2, 2011 at 9:56am
For the normal value we should use the drupal_validate_utf8($text)
For the field name we need preg_match('/^[a-zA-Z0-9_\x7f-\xff]+$/'
Comment #28
nick_vhComment #29
nick_vhAdded some more simpletests to verify the inner workings of the validateFilter function
Comment #30
nick_vhCommitted
Comment #31
nick_vhComment #32
nick_vhCommitted this exact patch to 6.x-3.x