I've attached a proposed patch for the addition of multiple spelling suggestions to the search form.

In SOLR config, if spellcheck.count is set to greater than one (i.e. 5) then only one suggestion is every displayed.

This patch checks the full array of spelling suggestions, removes any which match the term being searched for and then displays all the suggestions as links to that search.

Comments

tsphethean’s picture

Code didnt come through in original post. The XML in SOLR config to test this would be:

<str name="spellcheck">true</str>
<str name="spellcheck.onlyMorePopular">true</str>
<str name="spellcheck.extendedResults">true</str>
<!--  The number of suggestions to return -->
<str name="spellcheck.count">5</str>
jpmckinney’s picture

Status: Active » Needs work

Could you provide a unified diff? (diff -u)

tsphethean’s picture

StatusFileSize
new2.23 KB

Unified diff attached.

tsphethean’s picture

StatusFileSize
new2.44 KB

Bug identified in the previous attachment. Slight ammendment included in this version to cater for the suggestion object changing name.

tsphethean’s picture

StatusFileSize
new3.17 KB

This goes deeper than I initially thought once you start looking at the configuration options of the spelling suggestions component.

If <str name="spellcheck.extendedResults">true</str> then SOLR returns an object of the suggestions including a count of the number of results each suggestion will return. This final patch checks if the suggestions are an object or not. If so, it reads the object and displays the count of results as well as the phrases, if not it just returns the phrases from the array.

Sorry for multiple submissions so quickly!

jpmckinney’s picture

How is this patch being created? It looks like a reverse patch. Please respect Drupal coding standards, especially when it comes to whitespace around control structures: http://drupal.org/coding-standards

tsphethean’s picture

Sorry, I've read up on the standards a bit more and attached a revised patch. Please let me know if there are any problems with this one.

jpmckinney’s picture

Status: Needs work » Needs review
StatusFileSize
new2.97 KB

For the patch to apply, I had to get rid of the lines in the patch that remove newlines.

tsphethean’s picture

I've been doing some more investigation into SOLR spelling suggestions, and have written more code to handle

<str name="spellcheck.extendedResults">true</str>
<str name="spellcheck.collate">true</str>

within the spelling suggestions to handle more possibilities in the SOLR configuration files.

Would you like that provided as an update to this patch (i.e. against the original), or as a diff against the patched version? Should I create a new issue for this?

First time patcher so this is all a little new to me!

pwolanin’s picture

Version: 6.x-1.1 » 6.x-1.x-dev
Category: bug » feature

This looks like a feature request - reconfiguring the solrconfig is not something we necessarily support.

Since we are sending the spellcheck param, seems like we'd send the others as well in the request?

jpmckinney’s picture

Version: 6.x-1.x-dev » 6.x-1.1
Category: feature » bug

An update to this patch would be enough. Uploading a diff between patches, too, would make reviewing the patch easier, but is not required :)

After seeing the patch, if it looks like the issue should be split, we can do that.

Thanks for addressing the unsupported spellcheck parameters!

jpmckinney’s picture

Version: 6.x-1.1 » 6.x-1.x-dev
Category: bug » feature

Didn't mean to update tags. I don't think this is about changing solrconfig.xml. It's about having our suggestions-munging code support all Solr spellcheck parameters.

pwolanin’s picture

@jpmckinney - just responding to the initial post that says the reporter changed their solrconfig.xml to change spellcheck behavior and then saw the problem.

jpmckinney’s picture

Ah, correct. I think we should support alternate configurations where possible.

tsphethean’s picture

Ok, I've attached the updated version of my patch which handles the different configuration options for SOLR spelling suggestions that a user can make. The reason for the patch is that if a user changes the SOLR configuration files without this patch then errors were being displayed when using the unpatched module.

The main change in this version of the patch is that when:

<str name="spellcheck.extendedResults">true</str>

then SOLR passes back a more complex object than the array that was being handled. Extended results means that the frequency of the spelling suggestions occurring in the index can be displayed to the end user.

Other changes are to handle:

<str name="spellcheck.collate">true</str>

Collation looks at the phrases for which spelling suggestions are being provided, and puts together the most likely group of phrases based rather than a single phrase. For example, if I were to search for "Druapl slro", with collate = false I would be given suggestions of "Drupal" and "solr" as individual phrases. With collate = true, solr will return "Drupal solr" as a collated term. In this patch, I have put the collated term first in the list of suggestions, as this is likely to be most relevant.

There is a patch for SOLR in the works which will enhance the collation functionality (https://issues.apache.org/jira/browse/SOLR-2010) which if it makes it into the full SOLR release may mean some refactoring of this patch will be required. The SOLR patch will allow multiple collations to be returned, and ensure that only collations which will return search results are returned.

I've also attached my solrconfig.xml with these parameters in, for reference.

Let me know if any further changes or information are needed.

pwolanin’s picture

I'm not sure the patch is right - a quick read looks like it is only showing a spelling suggestion for the first word?

tsphethean’s picture

Yes, there's an interesting issue with multiple search phrases. Essentially, for each word in the query SOLR will do a spell check. For each word which it thinks is spelt incorrectly it will return an item in the search response object containing the number of suggestions specified in <str name="spellcheck.count">5</str> in solrconfig.xml.

With collation enabled, another aspect this patch addresses (<str name="spellcheck.collate">true</str>) it will return the combined suggestions for the whole phrase using the most likely suggestion for each word in the query.

What I'm not sure about is how best this should be displayed to the user... if a user searches for a query with 5 words in it, all spelt incorrectly, should we return 5 x 5 = 25 spelling suggestions for every word? The approach i've taken with the patch is to return spelling suggestions for the first incorrectly spelt word in the phrase, and the collated spelling suggestion, so as not to overload the user. I'd be grateful for any suggestions.

As an aside, I'm using this module in conjunction with the Apache Solr Autocomplete module (which I am writing a similar patch for to enhance the spelling suggestions of the autocomplete) so the user gets instant feedback on their spelling mistakes which would mitigate the problem of only suggesting the first word... Any thoughts?

tsphethean’s picture

Are there any suggestions on how to proceed on this, or is the approach in the patch ok?

pwolanin’s picture

Version: 6.x-1.x-dev » 7.x-1.x-dev
Status: Needs review » Needs work
pwolanin’s picture

Is there an example of a search interface you know of that offers more than one spelling suggestion?

tsphethean’s picture

Having had a think, and experiment with a few different searches, one spelling suggestion seems to be the sensible way to approach this. If used in conjunction with the apache solr auto-complete module and the spelling suggestion collation then that should give the best chance of returning something useful to the user.

The flip side is that since SOLR offer the configuration option to return multiple results, maybe the module needs to detect whether collation is enabled and if so just display the collated correction and the original phrase. If collation is not enabled then it should return however many corrections the SOLR configuration specifies.

pwolanin’s picture

that sounds like a reasonable approach - but is it easy to detect?

tsphethean’s picture

Yes, in SOLR config, there is an entry:

<str name="spellcheck.collate">true</str>

When set to true, in the SOLR query response there is a collation element in the response object (see the patch in #15 - it does a check for whether collated result was returned).

nick_vh’s picture

#1340232: Did You Mean suggestions should be broken out from core search form was committed and should make it much easier to take a stab at this issue.

milesw’s picture

StatusFileSize
new1.28 KB

Just like #21, after some experimenting, I found that using collation and falling back to suggestions for individual terms delivers much better suggestions.

With Solr 1.4, collated suggestions are not guaranteed to produce results (unless you patch with SOLR-2010).

With Solr 3.x, collated suggestions are tested to be sure they lead to actual search results.

This patch handles collated suggestions from the Solr response and falls back to normal search suggestions. It does not enable multiple suggestions. I still don't see a clean way to do that when there are multiple search terms.

You can test the patch without changing solrconfig.xml by using the the following hook. This is the configuration I found to produce good results:

function modulename_apachesolr_query_alter($query) {
  $query->addParam('spellcheck.count', 5);
  $query->addParam('spellcheck.collate', 'true');
  $query->addParam('spellcheck.onlyMorePopular', 'false');
  $query->addParam('spellcheck.maxCollationTries', 100);
}

The more testing the better! I tested with both Solr 1.4.1 and Solr 3.4. See SOLR-2585 for explanation of onlyMorePopular with collation.

nick_vh’s picture

Status: Needs work » Needs review
nick_vh’s picture

Status: Needs review » Needs work

I tried this patch and it looks ok codewise but I was not able to replicate the behavior with some standard set of words. However, I think this patch needs some additional documentation and maybe an example in the apachesolr.api that incorporates comment #25?

Thanks for all the work!

milesw’s picture

Title: Configuring SOLR to return multiple suggestions not displayed on search suggestions » Support collation for Did You Mean spelling suggestions, and support multiple suggestions

@Nick_vh: Thanks for testing. What do you mean when you say you were not able to replicate the behavior? Could you confirm that collated suggestions were being displayed? An example for the api docs is a good idea, I'll post something soon.

nick_vh’s picture

Somehow they did not show. Maybe I only had 1 spelling suggestion so it would be very useful (also for future testing) if you could provide us with some testing content that we can index (say, 3 node body content sets) and test this functionality with

nironan’s picture

Issue summary: View changes
StatusFileSize
new1.37 KB

Re-rolled against the latest dev version.