Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
Problem/Motivation
In some cases, the module needs to generate a ReRankQuery using user entered keywords. If one of these keywords contains a single quote. The rerank query fails because its payload is not properly escaped.
Example:
- the user search for "Pilote d'avion"
- the generated ReRank payload is:
+{!payload_score f=boost_term v="Pilote" func=max}+{!payload_score f=boost_term v="d'avion" func=max}
- the payload is added to the ReRankQuery within single quotes, forming the query parameter:
rq={!rerank reRankQuery='+{!payload_score f=boost_term v="Pilote" func=max}+{!payload_score f=boost_term v="d'avion" func=max}'}
- SolR fails miserably at interpreting this malformed request giving a 400 error which msg is:
org.apache.solr.search.SyntaxError: Expected identifier at pos 112 str='{!rerank reRankQuery='+{!payload_score f=boost_term v=\"Pilote\" func=max}+{!payload_score f=boost_term v=\"d'avion\" func=max}'}'
(the code highlighting above gives you some hint about the issue)
Proposed resolution
Escape de payload properly.
Comment | File | Size | Author |
---|---|---|---|
#9 | 3076111_payload_rerank_9.patch | 6.82 KB | mkalkbrenner |
Comments
Comment #2
DuaelFrI had a look for some test coverage but it seems that the
\Drupal\search_api_solr\Plugin\search_api\backend\SearchApiSolrBackend::search()
method is not currently covered.Comment #3
mkalkbrennerI think we already solved that issue differently: https://github.com/solariumphp/solarium/pull/688
There're more situations or special quotes that need to be taken into account and which should be solved in solarium 5.1.
Unfortunately we can't switch to solarium 5.1 until #2876675: Allow symfony/event-dispatcher 4+ to be installed in Drupal 8 is solved. That core issue requires some pushing to get committed to 8.8 and backported to 8.7.
You could help us by applying the patch proposed in #2876675: Allow symfony/event-dispatcher 4+ to be installed in Drupal 8 and upgrade to solarium 5.1.0-rc.1.
I assume that your issue will be solved afterwards. Please report the result.
Comment #4
DuaelFrI'm not sure I understand how it could solve the issue without changing a line of Search API Solr.
We have a string that can contain any character, including simple quotes, and we wrap this string into simple quotes without any escaping. Even with a placeholder system, it's kind of a bad practice, isn't it?
Comment #5
mkalkbrennerI see. But I assume that the correct patch would be to remove the single quotes around the query for solarium 5.1.0.
Comment #6
mkalkbrennerIn theory this would be the correct patch.
Comment #7
mkalkbrennerAny feedback?
Comment #8
mkalkbrennerI added a test case based on the Search API HtmlFilter processor. This processor boosts terms that are encapsulated by configured tags like
<b>
or<h1>
.Like I expected, updating solarium to 5.1.0 solved the rerank query issue caused by "strange" characters.
The test also includes an explicit check for
d'avion
!BTW thanks to this issue I found some small bugs in the term boost implementation which could be considered to be edge cases. In these cases the didn't cause exceptions but the boosting simply didn't work.
Since searching for the "wrong" characters with Search API HTML tag boost enabled can lead to exceptions, I raise the priority to critical.
Unfortunately we depend on the commit of the no-brainer #2876675: Allow symfony/event-dispatcher 4+ to be installed in Drupal 8.
Comment #9
mkalkbrennerJust another small fix.
Comment #10
DuaelFrHi @mkalkbrenner!
Thanks for the follow up on this issue.
Sadly I'm at the end of a project so I have no remaining time to review your changes right now. I'll try to find a few minutes today but I'm not sure I'll be able to.
Comment #12
mkalkbrennerComment #13
DuaelFrThanks a lot @mkalkbrenner and well done for this patch!
Comment #15
valgibson CreditAttribution: valgibson at Finalist commentedReopening this issue as I found out one of our clients had a problem when certain bots inserted a 'pipe' symbol using this format:
code:fifa2023|
When boosting fields this resulted in a 500 error (java.lang.NumberFormatException).
When I remove the pipe symbol everything works as expected.
How can we escape the pipe character?