Instead of those strange filters=field1:value%20field2:value&solrsort=stitle asc, we should use standard query strings like field1=value&field2=value&sort=title:asc.

Discussed that with pwolanin this night and he said:

I'm not claiming I love the current setup, but it works
but in any case, a lot of re-coding

Comments

damien tournoud’s picture

Status: Active » Needs review
StatusFileSize
new10.81 KB

Here is a first shot at this.

One issue on this is that apachesolr_current_query() uses the "fields" string as part of the key it uses for its static cache. This makes little sense to me, so I changed the signature of the function so that we now explicitly set the query instead of the current lazy loading mechanism.

damien tournoud’s picture

By the way:

 Solr_Base_Query.php      |  118 ++++++++++++++++++++++-------------------------
 apachesolr.module        |   40 +++++++--------
 apachesolr_search.module |    9 ++-
 3 files changed, 81 insertions(+), 86 deletions(-)

This patch tends to remove complexity in the module, not add some.

damien tournoud’s picture

StatusFileSize
new12.96 KB

Oh, and we can remove filter_extract(), now ;)

pwolanin’s picture

This is an interesting start, but I'm a little dubious

You basically pass in the entire $_GET as the filters. However, we have 'q' and 'solrsort' and pager elements as possible arrary keys.

More seriously, I don't see if/how you are handling multiple instance of the same field and maintaining the original order.

In the new version of:
public function get_url_querystring()

it seems to me that all filters on the same field will come out grouped together regardless of the input order. If you want to go down this router, you'll have to mimic Robert's method of finding the strpos() in $_SERVER['QUERY_STRING'].

In this case, it would be simpler to maintain all as mult-valued and not convert any back to scalar.

damien tournoud’s picture

You basically pass in the entire $_GET as the filters. However, we have 'q' and 'solrsort' and pager elements as possible arrary keys.

Yes, those are filtered out by the field detection mechanism.

it seems to me that all filters on the same field will come out grouped together regardless of the input order. If you want to go down this router, you'll have to mimic Robert's method of finding the strpos() in $_SERVER['QUERY_STRING'].

True. All the different values of a fact will come up grouped together (this is really a limitation of the model where we delegate to PHP the task of parsing the query string into an array). Do we really want to keep the possibility of alternating facets? That seems overkill, and results in some very ugly code in the current implementation.

Other solution: express facets as part of the main URL:

search/apachesolr/<text>/field1=value1/field3=value3/field2=value2?sort=field3:desc

pwolanin’s picture

thinking about this more - why not take this approach instead:

use a GET parameter like we have now, such as 'filters', but break it up so it's an array.

instead of:

filters=uid:2 tid:9 tid:55 (smfield_cck_foo:bar AND smfield_cck_foo:noway) created:[NOW/YEAR TO NOW]

you could have:

filters[]=uid:2&filters[]=tid:9&filters[]=tid:55&filters[]=(smfield_cck_foo:bar AND smfield_cck_foo:noway)&filters[]=created:[NOW/YEAR TO NOW]

This would be way easier to keep a consistent order, I think, and would allow you to pull this out of $_GET very easily.

JacobSingh’s picture

I like peter's last suggestion best. In reality, we are passing multiple fq params to solr. Staying as close to that pattern as possible is to our advantage in terms of code simplicity.

I did a little survey of a few solr sites, in particular the commercial arm of solr:

http://www.lucidimagination.com/search/?q=filters#/p:droids,tika/s:email...

It's AJAX, so perhaps not the best example, but I could see us going in that direction too. It is certainly easier to read.

Volunteer DB:
http://www.volunteersolutions.org/volunteer/search-2?target=&keyword=foo...

Submits all filters.

Property listing:

http://www.enormo.com/search/loc/_USA%252FNew-York%252FNew-York%252FManh...

super long, but using the path. includes every possible filter. I assume it is using JS to build this array when you submit the form on the right.

Since all of our queries are ANDed together (ORs would go inside a single fq). It makes sense to me to do as peter is suggesting, or just simplify it and do something like he's suggesting, but a little nicer to look at:

q=keys&filters=(uid:2 OR uid:3);tid:5;created[000]

This isn't really any work to parse:
$params['fq'] = explode(';', $_GET['filters']);

Best,
Jacob

drunken monkey’s picture

This isn't really any work to parse:
$params['fq'] = explode(';', $_GET['filters']);

It's tempting to think so, but then a single semicolon in a search term could break the whole search. I think, the built-in functionality of building an array with "filters[]=..." would be a lot safer regarding odd characters, so I'd just go with the original suggestion. The cosmetic value of any alternative could be disputed anyways.

pwolanin’s picture

@Jacob - that last suggestion is essentially already what we have - using spaces as the separator and various regex to match.

JacobSingh’s picture

I think the whole argument is about what is prettier / less code to maintain. I'm just trying to establish something which is easier to parse if that is the issue at hand. Personally, I think we should just find a delimiter that is more reliable than space (how about :/ or something)? We could even make it a variable in the variables table.

That means we have a 1 line parsing operation, and it is still readable. I think ultimately this is not a really valuable place to be throwing a ton of time, so I'll stop my 2bits, would like to avoid the mile long URL though if possible.

Best,
Jacob

damien tournoud’s picture

StatusFileSize
new12.97 KB

A new version, now ordering should be kept in every cases.

pwolanin’s picture

Status: Needs review » Needs work

still not sure I like this approach.

Also, looks like you have some issues with the patch - I don't think it's against the code in CVS.

robertdouglass’s picture

Just a note that we should revisit this when we open the 6--2-0 branch.

mkalkbrenner’s picture

Just a comment from my point of view:
This change will break a lot of custom code in our existing projects. For me it's an API change that's too late because RC1 is already released. This issue should be postponed for the same reason as #494182: apachesolr and core search treat Drupal fields (Field API, CCK) differently.

robertdouglass’s picture

exactly why it must wait for the 6.2 branch.

pwolanin’s picture

Even then I have doubts.

robertdouglass’s picture

Version: 6.x-1.x-dev » 6.x-2.x-dev

Damien, #11 looks interesting - if we can get less code, better query parameter names, and not break anything, I'd be happy. Now that 6.2 is open we can make changes like this.

Pwolanin please explain why you don't like the approach in #11. Thanks.

jpmckinney’s picture

Version: 6.x-2.x-dev » 7.x-1.x-dev
pwolanin’s picture

Title: Apachesolr should use standard query strings » Switch from ?filters= to ?f[]=

I still think filter parsing and a variety of code could be simplified or improved if we took in a array of filters in $_GET['f'] (or something similar) rather than one big string which we have to break apart. I think this could also give us a model for a more generic way to take in filters for D8 core (as part of the goal of having a facet API in core). I'm still not fond of trying to find fields in the query string - e.g. it means I can't filter on a field named "page" and also have a pager.

pwolanin’s picture

Status: Needs work » Active

We now use ?fq[]= and Facet APi uses ?f[]=

The remaining work here is potentially just to have a configuration for each search page as to whether added ?fq[]= params are used.

pwolanin’s picture

Title: Switch from ?filters= to ?f[]= » Configure per page use of ?fq[]=
nick_vh’s picture

since hook_apachesolr_page_alter is there now I assume this can happen in custom code?

nick_vh’s picture

Title: Configure per page use of ?fq[]= » Add configuration option to a search page if we allow user input using the url or not

Changing title

nick_vh’s picture

StatusFileSize
new3.35 KB
nick_vh’s picture

StatusFileSize
new4.78 KB

Some improvements

nick_vh’s picture

Status: Active » Needs review
pwolanin’s picture

looks the validation is applied to the wrong fields - see #1313698: Support for search of multiword content in facets/fields

#28
Posted by Nick_vh on December 2, 2011 at 9:56am

For the normal value we should use the drupal_validate_utf8($text)
For the field name we need preg_match('/^[a-zA-Z0-9_\x7f-\xff]+$/'

nick_vh’s picture

Status: Needs review » Needs work
nick_vh’s picture

StatusFileSize
new7.34 KB

Added some more simpletests to verify the inner workings of the validateFilter function

nick_vh’s picture

Status: Needs work » Fixed

Committed

nick_vh’s picture

Status: Fixed » Closed (fixed)
nick_vh’s picture

Version: 7.x-1.x-dev » 6.x-3.x-dev

Committed this exact patch to 6.x-3.x