In SearchApiQuery::parseKeys drupal string functions are not used consequently. As a result when a quoted term contains multibyte characters, a closing quote character is not detected properly. This patch fixes the problem by converting all string manipulation functions within parseKeys to their drupal_* equivalents.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

drunken monkey’s picture

Title: [PATCH] Prevent incomplete quoting by using drupal string functions » Fix omission of Drupal multibyte wrapper functions
Status: Active » Needs review
FileSize
25.3 KB

Thanks for reporting this, should have fixed this a long time ago … Fact is I often forget to use the proper wrapper functions and that's the result. However, while we're at it, let's fix this in the whole module.
The attached patch does just that.

znerol’s picture

Sorry, no. It does not work that way. You frequently use something like $value[drupal_strlen($value) - 1] in order to get the last character of a string. However you may not use the result of drupal_strlen as an index into a string. For multibyte strings the value will be too low. Rather you should use drupal_substr($value, -1) in order to get the last character in a multibyte as demonstrated in my patch.

znerol’s picture

Assigned: Unassigned » znerol

I'll post an updated patch soon.

znerol’s picture

Title: Fix omission of Drupal multibyte wrapper functions » Fix erroneous use of Drupal multibyte wrapper functions
FileSize
1.69 KB

Ok, I reviewed the docs and the code. The API docs for drupal_substr says that:

for cutting off a string at a known character/substring location, the usage of PHP's normal strpos/substr is safe and much faster.

Therefore it is safe to use the plain PHP functions strpos and strlen in most situations. The issue at hand however is that drupal_strlen is used in query.inc where plain strlen would be appropriate.

Using grep and coder-review I analyzed all occurences of drupal-prefixed string functions vs non-prefixed string functions and IMHO the only problematic combination is the one in query.inc which caused me problems with multibyte search keys when they were enclosed in quotes.

Therefore I propose to fix the problem by removing the drupal-prefixed functions in favor of the plain strlen functions and leaving all the rest as it is now. A patch is attached which also adds an appropriate test to checkQueryParseKeys test case.

Where the drupal-prefixed string functions were used

contrib/search_api_views/includes/handler_argument.inc:99:      $filter = $this->query->createFilter(drupal_strtoupper($this->operator));

IMHO there is no need to use drupal_strtolower here. Operator is not expected to hold multibyte data.

contrib/search_api_views/includes/handler_filter_options.inc:140:        if (drupal_strlen($values) > 20) {

Ok.

includes/processor_ignore_case.inc:9:    $value = drupal_strtolower($value);

Ok.

includes/query.inc:471:            if ($v[drupal_strlen($v)-1] == '"') {
includes/query.inc:483:            $len = drupal_strlen($v);

Not ok. Output of drupal_strlen is number of characters and not number of bytes.

search_api.install:212:    'machine_name' => preg_replace('/[^a-z0-9]+/', '_', drupal_strtolower($name)),
search_api.install:367:      $base = $name = drupal_strtolower(preg_replace('/[^a-z0-9]+/i', '_', $server->name));
search_api.install:396:      $base = $name = drupal_strtolower(preg_replace('/[^a-z0-9]+/i', '_', $index->name));

IMHO there is no need to use drupal_strtolower here. Multibyte characters will be stripped anyway.

search_api.test:478:    $processed = drupal_strtolower($orig);

Ok.

Where coder-review suggests drupal-prefixed string functions

sites/all/modules/search_api/search_api.module:
 +463: [minor] in most cases, replace the string function with the drupal_ equivalent string functions
 +464: [minor] in most cases, replace the string function with the drupal_ equivalent string functions
 +1432: [minor] in most cases, replace the string function with the drupal_ equivalent string functions
 +1448: [minor] in most cases, replace the string function with the drupal_ equivalent string functions
 +1473: [minor] in most cases, replace the string function with the drupal_ equivalent string functions
 +1492: [minor] in most cases, replace the string function with the drupal_ equivalent string functions

We deal with controlled/static strings in all of the instances in this file. No action required.

sites/all/modules/search_api/includes/processor_html_filter.inc:
 +108: [minor] in most cases, replace the string function with the drupal_ equivalent string functions
 +112: [minor] in most cases, replace the string function with the drupal_ equivalent string functions
 +114: [minor] in most cases, replace the string function with the drupal_ equivalent string functions

I'm not sure about that. This is probably a separate issue. Test case: Try to boost img-tags. Then examine score of content having multibyte string in an attribute of the img-tag (e.g. <img title="Schöne Grüsse>)

sites/all/modules/search_api/includes/query.inc:
 +472: [minor] in most cases, replace the string function with the drupal_ equivalent string functions
 +485: [minor] in most cases, replace the string function with the drupal_ equivalent string functions
 +488: [minor] in most cases, replace the string function with the drupal_ equivalent string functions
 +626: [minor] in most cases, replace the string function with the drupal_ equivalent string functions
 +1005: [minor] in most cases, replace the string function with the drupal_ equivalent string functions

We deal with controlled/static strings in all of the instances in this file (sort order and sql operators). No action required. Off-by one error is fixed by converting the drupal-prefixed strlen functions to the plain versions.

sites/all/modules/search_api/includes/processor.inc:
 +242: [minor] in most cases, replace the string function with the drupal_ equivalent string functions
 +243: [minor] in most cases, replace the string function with the drupal_ equivalent string functions

We deal with controlled/static strings in all of the instances in this file (field names may not contain multibyte characters). No action required.

sites/all/modules/search_api/includes/callback_add_aggregation.inc:
 +292: [minor] in most cases, replace the string function with the drupal_ equivalent string functions

We deal with controlled/static strings in all of the instances in this file (field names may not contain multibyte characters). No action required.

sites/all/modules/search_api/includes/index_entity.inc:
 +726: [minor] in most cases, replace the string unction with the drupal_ equivalent string functions
 +734: [minor] in most cases, replace the string function with the drupal_ equivalent string functions

We deal with controlled/static strings in all of the instances in this file (field names may not contain multibyte characters). No action required.

sites/all/modules/search_api/contrib/search_api_facetapi/plugins/facetapi/query_type_date.inc:
 +94: [minor] in most cases, replace the string function with the drupal_ equivalent string functions
 +100: [minor] in most cases, replace the string function with the drupal_ equivalent string functions
 +103: [minor] in most cases, replace the string function with the drupal_ equivalent string functions
 +104: [minor] in most cases, replace the string function with the drupal_ equivalent string functions
 +123: [minor] in most cases, replace the string function with the drupal_ equivalent string functions
 +124: [minor] in most cases, replace the string function with the drupal_ equivalent string functions

Most probably there are no problems here because the non-prefixed string functions are used consequently here, i.e. no intermixing of drupal_strlen with index-access.

sites/all/modules/search_api/contrib/search_api_facetapi/plugins/facetapi/query_type_term.inc:
 +74: [minor] in most cases, replace the string function with the drupal_ equivalent string functions
 +75: [minor] in most cases, replace the string function with the drupal_ equivalent string functions
 +132: [minor] in most cases, replace the string function with the drupal_ equivalent string functions
 +136: [minor] in most cases, replace the string function with the drupal_ equivalent string functions
 +139: [minor] in most cases, replace the string function with the drupal_ equivalent string functions

Most probably there are no problems here because the non-prefixed string functions are used consequently here, i.e. no intermixing of drupal_strlen with index-access.

sites/all/modules/search_api/contrib/search_api_views/includes/query.inc:
 +84: [minor] in most cases, replace the string function with the drupal_ equivalent string functions
 +85: [minor] in most cases, replace the string function with the drupal_ equivalent string functions
 +401: [minor] in most cases, replace the string function with the drupal_ equivalent string functions

These operate on table names and field names. The drupal-string functions are definitely not necessary here.

sites/all/modules/search_api/contrib/search_api_views/includes/handler_argument_fulltext.inc:
 +23: [minor] in most cases, replace the string function with the drupal_ equivalent string functions

This instance only operates on a table name. No problems.

sites/all/modules/search_api/contrib/search_api_views/includes/display_facet_block.inc:
 +32: [minor] in most cases, replace the string function with the drupal_ equivalent string functions
 +73: [minor] in most cases, replace the string function with the drupal_ equivalent string functions
 +119: [minor] in most cases, replace the string function with the drupal_ equivalent string functions
 +155: [minor] in most cases, replace the string function with the drupal_ equivalent string functions
 +201: [minor] in most cases, replace the string function with the drupal_ equivalent string functions
 +205: [minor] in most cases, replace the string function with the drupal_ equivalent string functions
 +208: [minor] in most cases, replace the string function with the drupal_ equivalent string functions

Most probably there are no problems here because the non-prefixed string functions are used consequently here, i.e. no intermixing of drupal_strlen with index-access.

sites/all/modules/search_api/contrib/search_api_views/includes/handler_argument_more_like_this.inc:
 +26: [minor] in most cases, replace the string function with the drupal_ equivalent string functions

This instance only operates on a table name. No problems.

sites/all/modules/search_api/contrib/search_api_views/includes/handler_filter_fulltext.inc:
 +121: [minor] in most cases, replace the string function with the drupal_ equivalent string functions

This instance only operates on a table name. No problems.

sites/all/modules/search_api/contrib/search_api_views/search_api_views.install:
 +15: [minor] in most cases, replace the string function with the drupal_ equivalent string functions

This instance only operates on a table name. No problems.

sites/all/modules/search_api/search_api.admin.inc:
 +1410: [minor] in most cases, replace the string function with the drupal_ equivalent string functions
 +1624: [minor] in most cases, replace the string function with the drupal_ equivalent string functions

We deal with controlled/static strings in all of the instances in this file (field names may not contain multibyte characters). No action required.

drunken monkey’s picture

Status: Needs review » Fixed

Wow, thanks for your efforts! You even added a test case – exemplary! ;)

The patch works fine, the test checks out … committed! Thanks again!

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.