Problem/Motivation

In some cases, the module needs to generate a ReRankQuery using user entered keywords. If one of these keywords contains a single quote. The rerank query fails because its payload is not properly escaped.

Example:

  1. the user search for "Pilote d'avion"
  2. the generated ReRank payload is:
    +{!payload_score f=boost_term v="Pilote" func=max}+{!payload_score f=boost_term v="d'avion" func=max}
  3. the payload is added to the ReRankQuery within single quotes, forming the query parameter:
    rq={!rerank reRankQuery='+{!payload_score f=boost_term v="Pilote" func=max}+{!payload_score f=boost_term v="d'avion" func=max}'}
  4. SolR fails miserably at interpreting this malformed request giving a 400 error which msg is:
    org.apache.solr.search.SyntaxError: Expected identifier at pos 112 str='{!rerank reRankQuery='+{!payload_score f=boost_term v=\"Pilote\" func=max}+{!payload_score f=boost_term v=\"d'avion\" func=max}'}'

(the code highlighting above gives you some hint about the issue)

Proposed resolution

Escape de payload properly.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

DuaelFr created an issue. See original summary.

DuaelFr’s picture

Status: Active » Needs review
FileSize
759 bytes

I had a look for some test coverage but it seems that the \Drupal\search_api_solr\Plugin\search_api\backend\SearchApiSolrBackend::search() method is not currently covered.

mkalkbrenner’s picture

I think we already solved that issue differently: https://github.com/solariumphp/solarium/pull/688

There're more situations or special quotes that need to be taken into account and which should be solved in solarium 5.1.
Unfortunately we can't switch to solarium 5.1 until #2876675: Allow symfony/event-dispatcher 4+ to be installed in Drupal 8 is solved. That core issue requires some pushing to get committed to 8.8 and backported to 8.7.

You could help us by applying the patch proposed in #2876675: Allow symfony/event-dispatcher 4+ to be installed in Drupal 8 and upgrade to solarium 5.1.0-rc.1.
I assume that your issue will be solved afterwards. Please report the result.

DuaelFr’s picture

I'm not sure I understand how it could solve the issue without changing a line of Search API Solr.
We have a string that can contain any character, including simple quotes, and we wrap this string into simple quotes without any escaping. Even with a placeholder system, it's kind of a bad practice, isn't it?

mkalkbrenner’s picture

I see. But I assume that the correct patch would be to remove the single quotes around the query for solarium 5.1.0.

mkalkbrenner’s picture

FileSize
1.12 KB

In theory this would be the correct patch.

mkalkbrenner’s picture

Any feedback?

mkalkbrenner’s picture

Priority: Normal » Critical
FileSize
6.82 KB

I added a test case based on the Search API HtmlFilter processor. This processor boosts terms that are encapsulated by configured tags like <b> or <h1>.
Like I expected, updating solarium to 5.1.0 solved the rerank query issue caused by "strange" characters.
The test also includes an explicit check for d'avion!
BTW thanks to this issue I found some small bugs in the term boost implementation which could be considered to be edge cases. In these cases the didn't cause exceptions but the boosting simply didn't work.

Since searching for the "wrong" characters with Search API HTML tag boost enabled can lead to exceptions, I raise the priority to critical.
Unfortunately we depend on the commit of the no-brainer #2876675: Allow symfony/event-dispatcher 4+ to be installed in Drupal 8.

mkalkbrenner’s picture

FileSize
724 bytes
6.82 KB

Just another small fix.

DuaelFr’s picture

Hi @mkalkbrenner!
Thanks for the follow up on this issue.
Sadly I'm at the end of a project so I have no remaining time to review your changes right now. I'll try to find a few minutes today but I'm not sure I'll be able to.

  • mkalkbrenner committed 1c84c9f on 8.x-3.x
    Issue #3076111 by DuaelFr, mkalkbrenner: Rerank query payload need to be...
  • mkalkbrenner committed bcd48be on 8.x-3.x
    Issue #3076111 by DuaelFr, mkalkbrenner: Rerank query payload need to be...
mkalkbrenner’s picture

Status: Needs review » Fixed
DuaelFr’s picture

Thanks a lot @mkalkbrenner and well done for this patch!

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

valgibson’s picture

Reopening this issue as I found out one of our clients had a problem when certain bots inserted a 'pipe' symbol using this format:

code:fifa2023|

When boosting fields this resulted in a 500 error (java.lang.NumberFormatException).
When I remove the pipe symbol everything works as expected.
How can we escape the pipe character?