Search.module's query extraction regexps don't handle field values with spaces in them. This means we can't do things like term:"red dwarf" and are limited to tid:4711. Here is the function in question:

/**
 * Extract a module-specific search option from a search query. e.g. 'type:book'
 */
function search_query_extract($keys, $option) {
  if (preg_match('/(^| )'. $option .':([^ ]*)( |$)/i', $keys, $matches)) {
    return $matches[2];
  }
}

Here is code I wrote for ApacheSolr which is one potential solution to the problem. The solution involves quoting any phrases (and escaping quotes inside phrases \").

    /**
   * This is copied from search module. The search module implementation doesn't
   * handle quoted terms correctly (bug) and this function is copied here until
   * I have the bugfix perfected, at which point a patch will be submitted to search
   * module with the goal of removing the function here.
   *
   * Extract a module-specific search option from a search query. e.g. 'type:book'
   */
  static function query_extract($keys, $option) {
    $pattern = '/(^| )'. $option .':(\"([^\"]*)\")/i';
    preg_match_all($pattern, $keys, $matches);
    if (!empty($matches[2])) {
      // The preg_replace removes beginning and trailing quotations.
      return preg_replace('/^"|"$/', '', $matches[2]);
    }
    $pattern = '/(^| )'. $option .':([^ ]*)/i';
    if (preg_match_all($pattern, $keys, $matches)) {
      if (!empty($matches[2])) {
        return $matches[2];
      }
    }
  }

Comments

robertdouglass’s picture

I described this differently on another issue:

The purpose of search_query_extract is to look at the incoming search query and find key:value pairs such as uid:1, tid:42. This is very useful and portable to other search implementations (such as ApacheSolr), but the regexp's have limitations in their current form. They don't handle keys or values with spaces. Keys with spaces is pretty easy to avoid, so I'm not focusing on that. Values with spaces come up in the context of faceted search in the ApacheSolr module, however, and I think that they will come up in other search solutions as well. The goal of this issue, then, is to expand the syntax to handle cases like term:"foo bar" and name:"Dries \"Cluebat\" Buytaert". These should parse to $term => "foo bar" and $name => 'Dries "Cluebat" Buytaert" respectively.

aptereket’s picture

It's an old issue, but why it not fixed?

jhodgdon’s picture

It has not been fixed because no one has submitted a patch file and tested to see whether the fix works, etc.

jhodgdon’s picture

Status: Active » Closed (works as designed)

Can't the module that wants to use the insert/extract functions just encode their information without spaces? I think that's a better solution, since all of the search expression logic uses spaces as separators, and this would make it quite a bit more complex.