Needs work
Project:
Apache Solr Search
Version:
7.x-1.x-dev
Component:
Code
Priority:
Normal
Category:
Feature request
Assigned:
Unassigned
Reporter:
Created:
6 Aug 2010 at 13:56 UTC
Updated:
12 Jun 2018 at 09:56 UTC
Jump to comment: Most recent, Most recent file
Comments
Comment #1
tsphethean commentedCode didnt come through in original post. The XML in SOLR config to test this would be:
Comment #2
jpmckinney commentedCould you provide a unified diff? (diff -u)
Comment #3
tsphethean commentedUnified diff attached.
Comment #4
tsphethean commentedBug identified in the previous attachment. Slight ammendment included in this version to cater for the suggestion object changing name.
Comment #5
tsphethean commentedThis goes deeper than I initially thought once you start looking at the configuration options of the spelling suggestions component.
If
<str name="spellcheck.extendedResults">true</str>then SOLR returns an object of the suggestions including a count of the number of results each suggestion will return. This final patch checks if the suggestions are an object or not. If so, it reads the object and displays the count of results as well as the phrases, if not it just returns the phrases from the array.Sorry for multiple submissions so quickly!
Comment #6
jpmckinney commentedHow is this patch being created? It looks like a reverse patch. Please respect Drupal coding standards, especially when it comes to whitespace around control structures: http://drupal.org/coding-standards
Comment #7
tsphethean commentedSorry, I've read up on the standards a bit more and attached a revised patch. Please let me know if there are any problems with this one.
Comment #8
jpmckinney commentedFor the patch to apply, I had to get rid of the lines in the patch that remove newlines.
Comment #9
tsphethean commentedI've been doing some more investigation into SOLR spelling suggestions, and have written more code to handle
within the spelling suggestions to handle more possibilities in the SOLR configuration files.
Would you like that provided as an update to this patch (i.e. against the original), or as a diff against the patched version? Should I create a new issue for this?
First time patcher so this is all a little new to me!
Comment #10
pwolanin commentedThis looks like a feature request - reconfiguring the solrconfig is not something we necessarily support.
Since we are sending the spellcheck param, seems like we'd send the others as well in the request?
Comment #11
jpmckinney commentedAn update to this patch would be enough. Uploading a diff between patches, too, would make reviewing the patch easier, but is not required :)
After seeing the patch, if it looks like the issue should be split, we can do that.
Thanks for addressing the unsupported spellcheck parameters!
Comment #12
jpmckinney commentedDidn't mean to update tags. I don't think this is about changing solrconfig.xml. It's about having our suggestions-munging code support all Solr spellcheck parameters.
Comment #13
pwolanin commented@jpmckinney - just responding to the initial post that says the reporter changed their solrconfig.xml to change spellcheck behavior and then saw the problem.
Comment #14
jpmckinney commentedAh, correct. I think we should support alternate configurations where possible.
Comment #15
tsphethean commentedOk, I've attached the updated version of my patch which handles the different configuration options for SOLR spelling suggestions that a user can make. The reason for the patch is that if a user changes the SOLR configuration files without this patch then errors were being displayed when using the unpatched module.
The main change in this version of the patch is that when:
then SOLR passes back a more complex object than the array that was being handled. Extended results means that the frequency of the spelling suggestions occurring in the index can be displayed to the end user.
Other changes are to handle:
Collation looks at the phrases for which spelling suggestions are being provided, and puts together the most likely group of phrases based rather than a single phrase. For example, if I were to search for "Druapl slro", with collate = false I would be given suggestions of "Drupal" and "solr" as individual phrases. With collate = true, solr will return "Drupal solr" as a collated term. In this patch, I have put the collated term first in the list of suggestions, as this is likely to be most relevant.
There is a patch for SOLR in the works which will enhance the collation functionality (https://issues.apache.org/jira/browse/SOLR-2010) which if it makes it into the full SOLR release may mean some refactoring of this patch will be required. The SOLR patch will allow multiple collations to be returned, and ensure that only collations which will return search results are returned.
I've also attached my solrconfig.xml with these parameters in, for reference.
Let me know if any further changes or information are needed.
Comment #16
pwolanin commentedI'm not sure the patch is right - a quick read looks like it is only showing a spelling suggestion for the first word?
Comment #17
tsphethean commentedYes, there's an interesting issue with multiple search phrases. Essentially, for each word in the query SOLR will do a spell check. For each word which it thinks is spelt incorrectly it will return an item in the search response object containing the number of suggestions specified in
<str name="spellcheck.count">5</str>in solrconfig.xml.With collation enabled, another aspect this patch addresses (
<str name="spellcheck.collate">true</str>) it will return the combined suggestions for the whole phrase using the most likely suggestion for each word in the query.What I'm not sure about is how best this should be displayed to the user... if a user searches for a query with 5 words in it, all spelt incorrectly, should we return 5 x 5 = 25 spelling suggestions for every word? The approach i've taken with the patch is to return spelling suggestions for the first incorrectly spelt word in the phrase, and the collated spelling suggestion, so as not to overload the user. I'd be grateful for any suggestions.
As an aside, I'm using this module in conjunction with the Apache Solr Autocomplete module (which I am writing a similar patch for to enhance the spelling suggestions of the autocomplete) so the user gets instant feedback on their spelling mistakes which would mitigate the problem of only suggesting the first word... Any thoughts?
Comment #18
tsphethean commentedAre there any suggestions on how to proceed on this, or is the approach in the patch ok?
Comment #19
pwolanin commentedComment #20
pwolanin commentedIs there an example of a search interface you know of that offers more than one spelling suggestion?
Comment #21
tsphethean commentedHaving had a think, and experiment with a few different searches, one spelling suggestion seems to be the sensible way to approach this. If used in conjunction with the apache solr auto-complete module and the spelling suggestion collation then that should give the best chance of returning something useful to the user.
The flip side is that since SOLR offer the configuration option to return multiple results, maybe the module needs to detect whether collation is enabled and if so just display the collated correction and the original phrase. If collation is not enabled then it should return however many corrections the SOLR configuration specifies.
Comment #22
pwolanin commentedthat sounds like a reasonable approach - but is it easy to detect?
Comment #23
tsphethean commentedYes, in SOLR config, there is an entry:
When set to true, in the SOLR query response there is a collation element in the response object (see the patch in #15 - it does a check for whether collated result was returned).
Comment #24
nick_vh#1340232: Did You Mean suggestions should be broken out from core search form was committed and should make it much easier to take a stab at this issue.
Comment #25
milesw commentedJust like #21, after some experimenting, I found that using collation and falling back to suggestions for individual terms delivers much better suggestions.
With Solr 1.4, collated suggestions are not guaranteed to produce results (unless you patch with SOLR-2010).
With Solr 3.x, collated suggestions are tested to be sure they lead to actual search results.
This patch handles collated suggestions from the Solr response and falls back to normal search suggestions. It does not enable multiple suggestions. I still don't see a clean way to do that when there are multiple search terms.
You can test the patch without changing solrconfig.xml by using the the following hook. This is the configuration I found to produce good results:
The more testing the better! I tested with both Solr 1.4.1 and Solr 3.4. See SOLR-2585 for explanation of onlyMorePopular with collation.
Comment #26
nick_vhComment #27
nick_vhI tried this patch and it looks ok codewise but I was not able to replicate the behavior with some standard set of words. However, I think this patch needs some additional documentation and maybe an example in the apachesolr.api that incorporates comment #25?
Thanks for all the work!
Comment #28
milesw commented@Nick_vh: Thanks for testing. What do you mean when you say you were not able to replicate the behavior? Could you confirm that collated suggestions were being displayed? An example for the api docs is a good idea, I'll post something soon.
Comment #29
nick_vhSomehow they did not show. Maybe I only had 1 spelling suggestion so it would be very useful (also for future testing) if you could provide us with some testing content that we can index (say, 3 node body content sets) and test this functionality with
Comment #30
nironan commentedRe-rolled against the latest dev version.