SQL Search (Trip Search) and stopwords

adminfor@inforo... - January 22, 2007 - 12:31
Project:SQL Search (Trip Search)
Version:4.7.x-1.1
Component:Code
Category:feature request
Priority:normal
Assigned:joel_guesclin
Status:active
Description

Another nice to have, I guess could be useful, but, feel free to give your opinions.

I look at watchdog regularly, and I´ve seen that many searchs are returning no results due to stopwords.

ft_stopword_file is a system file to provide stop words in fulltext search. But, in my case, I´m running Drupal with spanish content, and, I can't set this system file and restart the server. Also, in shared enviromemts, this could be no easiest to be set.

The proposed feature is to add a field in this module options where the administrator can set this stopwords, and:

1) if there is no results after a search, add a new message after the message "Your search yielded no results." like this: "Consider searching ............." with the keys trimmed of stopwords.
For example, if the user is searching "what is the meaning of stopwords", the new keywords search will propose:

Consider searching <a href="/trip_search/?keys=meaning stopwords">meaning stopwords</a>

2) or,
add a parm to trim stopwords before any search without asking the user.

Also, I added a block with coding borrowed from http://drupal.org/node/17970 , that encourages users arriving to my site from crawlers, to keep searching the crawler keywords internally. Sometimes crawlers index some results when they are at "whatever?page=3", and when user arrives, the indexed node may be in page=4, for example, due to recent additions. Then, the users doesn´t see what they are searching for in the current page and leaves the site.

If some interested, I've included this code (with the trim of spanish stopwords) at the end of this post.

Regards,
Gustavo

------------------------------
Block "more options", is only shown if referer is set with $searchengines above

<?php
$searchtitle
="Continue searching: ";
$output='';
$searchstr='';
  
$searchengines = array(
           
'^http://www.google.*$' => 'q',
           
'^http://search.yahoo.com.*$' => 'p',
           
'^http://ar.search.yahoo.com.*$' => 'p',
    );

  
$referer = getenv("HTTP_REFERER");
   while( list(
$regexp, $qsitem ) = each( $searchengines ) )
  {
      if(
eregi( $regexp, $referer ) )
    {
     
$url = parse_url( $referer );
     
$querystring = $url['query'];
     
$querystring = explode( "&", $querystring );
      while( list( ,
$value ) = each( $querystring ) )
      {
         
$item = explode( "=", $value );
        if(
$item[0] == $qsitem )
        {
          if(
trim( $item[1] ) != '' )
          {
             
$item[1] = urldecode( $item[1] );
             
$searchstr .= $item[1];
          }
        }
      }
    }
  }
  if (
$searchstr){

$simbols = array(".", "," , ";" , ":" , "-" , "=" , "?" , "¿" , "_");
$nsearchstr = str_replace($simbols, " ", $searchstr);
$stopwords = array(" a ", " ante ", " bajo ", " con ", " de ", " desde ", " durante ", " en " , " entre ", " excepto ", " hacia ", " hasta ", " mediante ", " para ", " por ", " salvo ", " segun ", " según ", " sin ", " sobre ", " y ", " o ", " u ", " tras ", " el ", " la ", " lo ", " los ", " las ", " un ", " una ", " unos ", " unas ");
$nsearchstr = str_replace($stopwords, " ", $nsearchstr);

$output .= "<p>". $searchtitle . '<a href="/trip_search/?keys='. $nsearchstr . '">'. $nsearchstr .' </a>';
  echo
$output;
  }
?>

if you are using standard Drupal search, you have to replace

<a href="/trip_search/?keys=

with
<a href="/search/node/

Also you have to consider to change stopwords for your language in $stopwords

#1

joel_guesclin - January 23, 2007 - 09:06
Assigned to:Anonymous» joel_guesclin

This looks very useful, indeed important. I will try to take this on board. In fact, there ought to be some way of allowing stopwords to be included for any language (perhaps in PO file?). I have the same problem with MySQL, that I am running multiple languages off the same server and can't change the stopwords.

 
 

Drupal is a registered trademark of Dries Buytaert.