The search.module utilizes 3 tables (that I am aware of)

search_index si
search_dataset sd
search_totals. st.

si and sd provide a "type" field currently set to 'node' in my db.

I have custom node types that have been indexed and their search terms are stored in the si table.

All data appears normally for default searches.

My client, however needs a custom search restricted to a specific node type, and a specific search screen for searching these types (not the default tab provided by implementing hook_search)

I would like to use the search_form($action='',$keys='',$type='ld',$prompt='') hook in the search.module to restrict the search to this type.

However, when I manually update the type field to the type I need ('ld' in my project) in both the si and sd tables, and set the $type='ld' parameter, the search fails. removing the $type parameter and resetting the fields to 'node' lets me get my search results.

Anybody had any similar experience in this?

thanks in advance.

t

Comments

prgrcmpny’s picture

Okey dokey, moving along, here is where I am at now,

I am going to keep this a running commentary as I develop this thingy, hopefully it will help others and for me it will be keep my thoughts organized as the day goes by.

First, to recap my project requirements.

1. My client needs a search of only a specific custom node type.
2. The search of the custom node type will return a sortable/pageable table
3. The search page for this node type must be its own page.

My first instinct is to leverage the existing API's to accomplish this, but after studying the search.module code (thank you open source!) I have come to the following (possibly incorrect) conclusions:

Requirement 3 somewhat rules out implementing ld_search (where 'ld' is the name of my custom module; ld.module) Implementing this places a tab on the default user search screen. I managed to get that tab to not appear by ommitting the case 'name' in the switch statement.

Requirement 1. kicked my butt yesterday and was the subject of the previous post in this thread..for the life of me, I cannot get search.modules do_search to consistently return my module types even when I make live hacks to the search_index.word field. there were several problems:

when I changed an indexed word from "leader" to "dude" and left the type as 'node' in both the si and sd tables, the engine would return a result for that node.

when I changed the type from 'node' to 'ld' in both tables, I got no results...

when I changed back to 'node' from 'ld' on both tables, I got no results...(bear in mind, I was a bit batty by the end of the day, so my thinking could have been loopy)

I spent too much time trying to identify the source of the above problem and I have to move on.

since I need custom output from the search that will be displayed in a sortable table with headers I believe that the

During the above contortions, it dawned on me that even if I got the correct node types consistently returned in searches, I would not be able to present the data in the format they required (hook_search_item seems to only allow customizing the output on a per-item basis, not to direct the output of the entire search result set--again, I could be wrong about this.)

So, I decided to move and implement my own custom callbacks, utilizing the parts of the api that I can leverage.

The following posts will be a running commentary on my efforts.

t.

prgrcmpny’s picture

when the user clicks a link to search for my custom node type, provide a callback to display a custom search form...

$items[] = array('path' => 'ld/ld_search_form_dev'
,'title' => t('Join the Leader Directory')
,'access' => user_access('access ld')
,'type' => MENU_CALLBACK
,'callback' => 'ld_search_form_dev'
);

prgrcmpny’s picture

Here, i leverage the search.module's form hook to display a custom search form, thereby meeting one of the project requirements...

return search_form('ld/ld_do_search','keys','ld','Search the Leader Directory');

the arguments to the search_form are $action, $keys, $type and $prompt.

so, I am telling the search form to redirect to the callback ld/ld_do_search .

Next, I see if I can see the redirect and the values of the arguments in my custom page.

prgrcmpny’s picture

I believe that I have correctly leveraged the search.module ->do_search functions and implemented the ld_update_index() hook so that I my module is integrated with the built in drupal search.

The fact that the like '%% %s %%' AND like '%% % %%' is beyond my immediate control leads me to leave the 'failures' of my search as they are for several reasons.

  • I only have a few custom nodes. maybe the problem will resolve itself as the number of nodes increase
  • I am VERY reluctant to change a module almost every module uses in Drupal
  • The taxonomy system exists...more on this below

I am very curious if and/or how the search.module interacts with the taxonomy system.

For example, with the categories module, we could 'tag' our custom node type as being a member or non-member of a subset. We could then pull from either subset based on this tagging.

Then, we faced the problem of integrating my custom node information/data with other node types...i.e. a nested configuration of nodes. The tool for that seems to be the views module, which looks like a custom query tool for pulling and presenting data.

Well, the problem would then be, making my data available to views and letting views handle the filtering based on taxonomy--i.e. my custom search implemented without me having to hack at do_search.

I am too new to drupal to answer these questions, but after reading about taxonomies and seeing how easy it is for a web-designer (as opposed to coder) to create data layouts and filter based on taxonomies, the wiser approach is probably there.

Well see.

Hope this running commentary helps somebody--it has helped me by being a reference during my coding.

t.

prgrcmpny’s picture

Ok, I left off yesterday with a good result, my custom callback was called by the form
and I managed to get it to print out the values of the $keys argument.

Here is my code in ld_do_search:

		print 'DO a custom leader directory search here.<br>';
		print 'remember, output must be against that tablular ouptut thing.<br>';
		print 'your keys are here: <br>';
		print $_POST['edit']['keys'] .'<br>';

and the output was...

DO a custom leadership directory search here.
remember, output must be against that tablular ouptut thing.
your keys are here:
these should be my keys

ok, that is good news, I have a custom form, a redirect to a place where I can control my own search and output and I have access to the search terms the user entered.

progress.

Now, I still want to leverage the search.module where possible.

From my study of the node.module and search.module, the search.modules do_search() function is the guts of the thing.

I now want to understand exactly what is going on, so I am going to hack at the do_search module returning data/sql at various stages of its progression.

prgrcmpny’s picture

in my ld_do_search

I am going to trap the output of the do_search module at various stages.

here is my "trapper" code

		print 'DO a custom leadership directory search here.<br>';
		print 'remember, output must be against that tablular ouptut thing.<br>';
		print 'your keys are here: <br>';
		print $_POST['edit']['keys'] .'<br>';
		print 'do search <br>';
		$find = 	do_search($_POST['edit']['keys'],'ld');
    print $find .'<br>';
prgrcmpny’s picture

Here is the search.modules do_search first few lines of code...

function do_search($keywords, $type, $join1 = '', $where1 = '1', $arguments1 = array(), $select2 = 'i.relevance AS score', $join2 = '', $arguments2 = array(), $sort_parameters = 'ORDER BY score DESC') {
  $query = search_parse_query($keywords);

  if ($query[2] == '') {
    form_set_error('keys', t('You must include at least one positive keyword with %count characters or more.', array('%count' => variable_get('minimum_word_size', 3))));
  }
  if ($query === NULL || $query[0] == '' || $query[2] == '') {
    return array();
  }

Note that the only two requied arguments for the do_search function are $keywords and $type.

I put in a return statement right after $query= search_parse_query($keywords);

here is what I get

DO a custom leadership directory search here.
remember, output must be against that tablular ouptut thing.
your keys are here:
these should be my keys
do search
Array
item:d.data LIKE '%% %s %%' AND d.data LIKE '%% %s %%' AND d.data LIKE '%% %s %%' AND d.data LIKE '%% %s %%' AND d.data LIKE '%% %s %%'
item:Array
item:i.word = '%s' OR i.word = '%s' OR i.word = '%s'
item:Array
item:3

frankly, I have no idea what this is, yet, but lets move on.

prgrcmpny’s picture

after the $query output, the do_search then loads a $conditions variable.
here is the code, with my return statement.

 // First pass: select all possible matching sids, doing a simple index-based OR matching on the keywords.
  // 'matches' is used to reject those items that cannot possibly match the query.
  $conditions = $where1 .' AND ('. $query[2] .") AND i.type = '%s'";
	return $conditions;

and here is the initial output.

DO a custom leadership directory search here.
remember, output must be against that tablular ouptut thing.
your keys are here:
these should be my keys
do search
1 AND (i.word = '%s' OR i.word = '%s' OR i.word = '%s') AND i.type = '%s'

ok, this looks like SQL, I like SQL!

note that the leading 1 at the beginning of the output, it is the default value for the$where1 argument.

what it means, I don't have a clue yet.

next....

prgrcmpny’s picture

Next up from the do_search module is the following lines of code:

 $arguments = array_merge($arguments1, $query[3], array($type, $query[4]));
  $result = db_query_temporary("SELECT i.type, i.sid, SUM(i.score * t.count) AS relevance, COUNT(*) AS matches FROM {search_index} i INNER JOIN {search_total} t ON i.word = t.word $join1 WHERE $conditions GROUP BY i.type, i.sid HAVING COUNT(*) >= %d", $arguments, 'temp_search_sids');

I learned that do_search loads data into a temp table and does two passes on it to return a valid result set. I am assuming that 'temp_search_sids' is the name of the table.

What I want to know is what the SQL looks like so I put a return value before the db hit, hoping to get the sql string...

here is my hack

 $arguments = array_merge($arguments1, $query[3], array($type, $query[4]));

	return "SELECT i.type, i.sid, SUM(i.score * t.count) AS relevance, COUNT(*) AS matches FROM {search_index} i INNER JOIN {search_total} t ON i.word = t.word $join1 WHERE $conditions GROUP BY i.type, i.sid HAVING COUNT(*) >= %d" .  $arguments . 'temp_search_sids';

  $result = db_query_temporary("SELECT i.type, i.sid, SUM(i.score * t.count) AS relevance, COUNT(*) AS matches FROM {search_index} i INNER JOIN {search_total} t ON i.word = t.word $join1 WHERE $conditions GROUP BY i.type, i.sid HAVING COUNT(*) >= %d", $arguments, 'temp_search_sids');

and the output is:

SELECT i.type,
i.sid,
SUM(i.score * t.count) AS relevance,
COUNT(*) AS matches
FROM {search_index} i
INNER JOIN {search_total} t ON i.word = t.word
WHERE 1 AND (i.word = '%s') AND i.type = '%s' GROUP BY i.type, i.sid HAVING COUNT(*) >= %d Array temp_search_sids

so my initial huch that it looks in the si table (search_index) is correct and it looks by type as well.
that GROUP BY has me a bit worried.....but GROUP BY's always get me worried.

ok, progresss...

next up..

prgrcmpny’s picture

Ok here are the next few lines in the do_search function


  // Calculate maximum relevance, to normalize it
  $normalize = db_result(db_query('SELECT MAX(relevance) FROM temp_search_sids'));
	
	
  if (!$normalize) {
    return array();
  }

it looks like 'temp_search_ids is a temp table.

lets take a look at normalize for my given query. I put a return statement right after the db hit

return $normalize;

and this is the output...

DO a custom leadership directory search here.
remember, output must be against that tablular ouptut thing.
your keys are here:
these should be my keys
do search

so it looks like $normalize does not exist and so the function will return an empty array.

lets check that..

I move the return statement to within the if(!$normalize) block...

$normalize = db_result(db_query('SELECT MAX(relevance) FROM temp_search_sids'));
  if (!$normalize) {
  	return 'like, dude, its not normalize';
    return array();
  }

here is the output.

DO a custom leadership directory search here.
remember, output must be against that tablular ouptut thing.
your keys are here:
these should be my keys
do search
like, dude, its not normalize

ok, not good, let me dig out some custom search terms I know exist...

select * from search_index where type = 'ld';

gives me some search terms to try...

| location              | 122 | ld   |       0 | NULL     |    16 |
| united                | 122 | ld   |       0 | NULL     |     1 |
| states                | 122 | ld   |       0 | NULL     |     1 |
| see                   | 122 | ld   |       0 | NULL     |     1 |
| map                   | 122 | ld   |       0 | NULL     |     1 |
| google                | 122 | ld   |       0 | NULL     |    11 |
| maps                  | 122 | ld   |       0 | NULL     |    11 |
| dude                  | 120 | ld   |       0 | NULL     |     1 |

lets go do that.

prgrcmpny’s picture

ok, I back track and add some existing search terms from my last comment.

here is the search page with the new search terms

DO a custom leadership directory search here.
remember, output must be against that tablular ouptut thing.
your keys are here:
location united states see map google maps dude
do search
like, dude, its not normalize

so my keys are there, and its not doing squat.

let me change the 'type' in the tables back to node and see if that fixes it.

mysql> update search_index set type = 'node' where type = 'ld';
Query OK, 91 rows affected (0.45 sec)
Rows matched: 91  Changed: 91  Warnings: 0

mysql> update search_dataset set type = 'node' where type = 'ld';
Query OK, 5 rows affected (0.26 sec)
Rows matched: 5  Changed: 5  Warnings: 0


now I rerun the output and get...

DO a custom leadership directory search here.
remember, output must be against that tablular ouptut thing.
your keys are here:
location united states see map google maps dude
do search
like, dude, its not normalize

now I am irritated.

hmmmm.

ok. I can scrap this approach and write my own search or try to grok this and leverage it.

I choose grok. if not grokked by today, then I write my own search on my own tables tomorrow.

bleah.

prgrcmpny’s picture

Ok, lets go back to the original sql...

SELECT i.type,
i.sid,
SUM(i.score * t.count) AS relevance,
COUNT(*) AS matches
FROM {search_index} i
INNER JOIN {search_total} t ON i.word = t.word
WHERE 1 AND (i.word = '%s') AND i.type = '%s' GROUP BY i.type, i.sid HAVING COUNT(*) >= %d Array temp_search_sids

all my types are back to 'node' in the 'i' table.
so lets hack this a bit and try to get results

I know I have a keyword of 'dude' in the search_index, lets see if I can pull it.

SELECT i.type, 
       i.sid, 
       SUM(i.score * t.count) AS relevance, 
       COUNT(*) AS matches 
  FROM {search_index} i 
  INNER JOIN {search_total} t ON i.word = t.word 
 WHERE 1 AND (i.word = 'dude') AND i.type = 'node'

result is...

heh, gotta remove the {} around the table names, back to our regular program, we get...

ERROR 1140 (42000): Mixing of GROUP columns (MIN(),MAX(),COUNT(),...) with no GROUP columns is illegal if there is no GROUP BY clause

makes sense,

next take

SELECT i.type, 
       i.sid, 
       SUM(i.score * t.count) AS relevance, 
       COUNT(*) AS matches 
  FROM search_index i 
  INNER JOIN search_total t ON i.word = t.word 
 WHERE 1 AND (i.word = 'dude') AND i.type = 'node'
 GROUP BY i.type, i.sid HAVING COUNT(*) >= %d

and I replace the %d with a 1

gets me...

mysql> SELECT i.type, 
    ->        i.sid, 
    ->        SUM(i.score * t.count) AS relevance, 
    ->        COUNT(*) AS matches 
    ->   FROM search_index i 
    ->   INNER JOIN search_total t ON i.word = t.word 
    ->  WHERE 1 AND (i.word = 'dude') AND i.type = 'node'
    ->  GROUP BY i.type, i.sid HAVING COUNT(*) >= 1;
+------+-----+------------------+---------+
| type | sid | relevance        | matches |
+------+-----+------------------+---------+
| node | 120 | 0.30103000998497 |       1 |
+------+-----+------------------+---------+
1 row in set (0.11 sec)

woo hoo!

ok, lets change the type to ld....

mysql->update search_index set type = 'ld' where sid = 120;
Query OK, 2 rows affected (0.07 sec)
Rows matched: 2  Changed: 2  Warnings: 0

and rerun the original sql with the node type changed.

mysql> SELECT i.type,         i.sid,         SUM(i.score * t.count) AS relevance,         COUNT(*) AS matches    FROM search_index i    INNER JOIN search_total t ON i.word = t.word   WHERE 1 AND (i.word = 'dude') AND i.type = 'ld'  GROUP BY i.type, i.sid HAVING COUNT(*) >= 1;
+------+-----+------------------+---------+
| type | sid | relevance        | matches |
+------+-----+------------------+---------+
| ld   | 120 | 0.30103000998497 |       1 |
+------+-----+------------------+---------+
1 row in set (0.06 sec)


ok, with the above assumptions, I should get a result.

lets check my assumptions.

prgrcmpny’s picture

Ok, from the original sql before $normalize, I need the contents of the array being passed to the db query

SELECT i.type,
i.sid,
SUM(i.score * t.count) AS relevance,
COUNT(*) AS matches
FROM {search_index} i
INNER JOIN {search_total} t ON i.word = t.word
WHERE 1 AND (i.word = '%s') AND i.type = '%s' GROUP BY i.type, i.sid HAVING COUNT(*) >= %d Array temp_search_sids

lets go get it...

first, the Array in question is the $arguments variable, lets see if I can get it out on a page..

gotta hit the php manual for it, so next comment.

prgrcmpny’s picture

ok,

I add a return to get the thing.

$arguments = array_merge($arguments1, $query[3], array($type, $query[4]));
return $arguments;
<code>

and some code to get the value of it in ld_do_search

<code>
		print 'DO a custom leadership directory search here.<br>';
		print 'remember, output must be against that tablular ouptut thing.<br>';
		print 'your keys are here: <br>';
		print $_POST['edit']['keys'] .'<br>';
		print 'do search <br>';
		$find = 	do_search($_POST['edit']['keys'],'ld');
    print 'find is ' . $find .'<br>';
		foreach ($find as $item){
		 print 'item:'. $item.'<br>';
			foreach($item as $duh){
			 print 'duh:'. $duh.'<br>';
		  }
		}

and the output is ...

DO a custom leadership directory search here.
remember, output must be against that tablular ouptut thing.
your keys are here:
dude
do search
find is Array
item:dude
item:ld
item:1

which would give me a correct where clause of

SELECT i.type,
i.sid,
SUM(i.score * t.count) AS relevance,
COUNT(*) AS matches
FROM search_index i
INNER JOIN search_total t ON i.word = t.word
WHERE 1 AND (i.word = 'dude') AND i.type = 'ld' GROUP BY i.type, i.sid HAVING COUNT(*) >= 1

(I removed the {}'s )

which gives me...

mysql> SELECT i.type,
    -> i.sid,
    -> SUM(i.score * t.count) AS relevance,
    -> COUNT(*) AS matches
    -> FROM search_index i
    -> INNER JOIN search_total t ON i.word = t.word
    -> WHERE 1 AND (i.word = 'dude') AND i.type = 'ld' GROUP BY i.type, i.sid HAVING COUNT(*) >= 1;
+------+-----+------------------+---------+
| type | sid | relevance        | matches |
+------+-----+------------------+---------+
| ld   | 120 | 0.30103000998497 |       1 |
+------+-----+------------------+---------+
1 row in set (0.10 sec)

mysql> 

hmmm something else is up....

prgrcmpny’s picture

I don't have access to the temp table, so I am going to hack into another table and run that sql

CREATE TABLE `ld_hack` (
  `type` varchar(64),
  `sid` int(10) unsigned,
  `relevance` int(10) unsigned,
  `matches` int(10) unsigned
) 

now I modify the search module to select into my ld_hack table....

I think I will try it as straight sql first...

mysql> INSERT INTO ld_hack  SELECT i.type,         i.sid,         SUM(i.score * t.count) AS relevance,         COUNT(*) AS matches    FROM search_index i    INNER JOIN search_total t ON i.word = t.word   WHERE 1 AND (i.word = 'dude') AND i.type = 'ld'  GROUP BY i.type, i.sid HAVING COUNT(*) >= 1;
Query OK, 1 row affected (0.19 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> select * from ld_hack;
+------+------+-----------+---------+
| type | sid  | relevance | matches |
+------+------+-----------+---------+
| ld   |  120 |         0 |       1 |
+------+------+-----------+---------+
1 row in set (0.06 sec)

mysql> 


so that works, that relevance = 0 is worrisome.....

lets try with the drupal first I empty the table delete from ld_hack
and modify the code.

$result = db_query("INSERT INTO ld_hack SELECT i.type, i.sid, SUM(i.score * t.count) AS relevance, COUNT(*) AS matches FROM {search_index} i INNER JOIN {search_total} t ON i.word = t.word $join1 WHERE $conditions GROUP BY i.type, i.sid HAVING COUNT(*) >= %d", $arguments);

I check if my hack worked and...voila

mysql> select * from ld_hack;
+------+------+-----------+---------+
| type | sid  | relevance | matches |
+------+------+-----------+---------+
| ld   |  120 |         0 |       1 |
+------+------+-----------+---------+
1 row in set (0.04 sec)

So , it is getting my keyword on my type, but it is ignoring them.

why. that is the next question.

prgrcmpny’s picture

Ok,

first, I am assuming that relevance is of type int10 unsiged....

lets check that assumption against the actual search tables for a clue.

 mysql> describe search_total;
+-------+-------------+------+-----+---------+-------+
| Field | Type        | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+-------+
| word  | varchar(50) |      | PRI |         |       |
| count | float       | YES  |     | NULL    |       |
+-------+-------------+------+-----+---------+-------+

ok, int/float interaction could be bombing the result in ld_hack, lets recreate it and rerun to see if there is a non-zero value

CREATE TABLE `ld_hack` (
  `type` varchar(64),
  `sid` int(10) unsigned,
  `relevance` float,
  `matches` int(10) unsigned,
) 

and rerun my hack
and look in the table..

mysql> select * from ld_hack;
+------+------+-----------+---------+
| type | sid  | relevance | matches |
+------+------+-----------+---------+
| ld   |  120 |   0.30103 |       1 |
+------+------+-----------+---------+
1 row in set (0.05 sec)


whew, dodged a bullet there.

next up...

prgrcmpny’s picture

but why are they being ignored....

lets look at the next step in the do_search function of the search.module.


$normalize = db_result(db_query('SELECT MAX(relevance) FROM temp_search_sids'))

SO I run the sql against my ld_hack table

mysql> SELECT MAX(relevance) FROM ld_hack;
+------------------+
| MAX(relevance)   |
+------------------+
| 0.30103000998497 |
+------------------+
1 row in set (0.05 sec)

mysql> 

know, I am still making the assumption that the value in the temp table is the same as the one in my ld_hack table.

lets check MAX against a zero...

mysql> update ld_hack set relevance = 0;
Query OK, 1 row affected (0.04 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> SELECT MAX(relevance) FROM ld_hack;
+----------------+
| MAX(relevance) |
+----------------+
|              0 |
+----------------+
1 row in set (0.04 sec)

mysql> 

ok, so max will return a row...whats up with the db result

lets try another hack...lets hack $normalize.....

hnmmm

prgrcmpny’s picture

ok, we are back to where we started to backtrack in this whole thing.

here is the hack in the do_search function

  // Calculate maximum relevance, to normalize it
  $normalize = db_result(db_query('SELECT MAX(relevance) FROM temp_search_sids'));
  $normalize = db_result(db_query('SELECT MAX(relevance) FROM ld_hack'));
	
  if (!$normalize) {
    return 'like, dude, its not normalize';
    return array();
  }
return 'I need a new career';

first , I delete from ld_hack in mysql

then I run it.

with the following code on the ld_do_search page.

function ld_do_search(){
		print 'DO a custom leadership directory search here.<br>';
		print 'remember, output must be against that tablular ouptut thing.<br>';
		print 'your keys are here: <br>';
		print $_POST['edit']['keys'] .'<br>';
		print 'do search <br>';
		$find = 	do_search($_POST['edit']['keys'],'ld');
    print 'find is ' . $find .'<br>';
		foreach ($find as $item){
		 print 'item:'. $item.'<br>';
			foreach($item as $duh){
			 print 'duh:'. $duh.'<br>';
		  }
		}

and we get...

DO a custom leadership directory search here.
remember, output must be against that tablular ouptut thing.
your keys are here:
dude
do search
find is i need a new career

so, I basically backtracked for nothing...ok, lets move on...

but wait, lets check some assumptions.

I am assuming that the $normalize from my hit on ld_hack is the same as the $normalize against the temp table.

lets clean up the db, deactivate my hit and run against the temp table and see if we get the same result.

// Calculate maximum relevance, to normalize it
  $normalize = db_result(db_query('SELECT MAX(relevance) FROM temp_search_sids'));
	//  $normalize = db_result(db_query('SELECT MAX(relevance) FROM ld_hack'));
	
  if (!$normalize) {
		 return 'like, dude, its not normalize';
    return array();
  }
	return 'i need a new career';
	

output is...

DO a custom leadership directory search here.
remember, output must be against that tablular ouptut thing.
your keys are here:
dude
do search
find is i need a new career

ok, assumption was ok.

I need to reread my thread here to see why $normalize originally did not work and the return of 'like dude its not normalize' got triggered.

back after these messages from our sponsor.

t.

prgrcmpny’s picture

Ok, it looks like the 'type' field in the search_index table was blowing it up.

first, lets look at the data in the tables

mysql> select type from search_index where sid = 120;
+------+
| type |
+------+
| ld   |
| ld   |
+------+
2 rows in set (0.05 sec)

mysql> select type from search_dataset where sid = 120;
+------+
| type |
+------+
| node |
| node |
| node |
+------+
3 rows in set (0.04 sec)

mysql> update search_index set type = 'node' where sid = 120;
Query OK, 2 rows affected (0.19 sec)
Rows matched: 2  Changed: 2  Warnings: 0

mysql> 

before I run the code, note that the discrepency (my bad) in the search_dataset.type and search_index.type had no effect on the result set so far....

running the code gives me

DO a custom leadership directory search here.
remember, output must be against that tablular ouptut thing.
your keys are here:
dude
do search
find is like, dude, its not normalize

ok, reproducable results.

now, I am changing search_index.type and search_dataset.type to 'ld' for node 120 and I should get some results back..

mysql> update search_index set type = 'ld' where sid = 120;
Query OK, 2 rows affected (0.09 sec)
Rows matched: 2  Changed: 2  Warnings: 0

mysql> update search_dataset set type = 'ld' where sid = 120;
Query OK, 3 rows affected (0.08 sec)
Rows matched: 3  Changed: 3  Warnings: 0

mysql> 

and the result is

DO a custom leadership directory search here.
remember, output must be against that tablular ouptut thing.
your keys are here:
dude
do search
find is i need a new career

ok, things are sane again, lets move on.

prgrcmpny’s picture

the $select2 variable is built next..

 $select2 = str_replace('i.relevance', '('. (1.0 / $normalize) .' * i.relevance)', $select2);
  return $select2;

modify my trap code

function ld_do_search(){
		print 'DO a custom leadership directory search here.<br>';
		print 'remember, output must be against that tablular ouptut thing.<br>';
		print 'your keys are here: <br>';
		print $_POST['edit']['keys'] .'<br>';
		print 'do search <br>';
		$find = 	do_search($_POST['edit']['keys'],'ld');
    print 'select2 is ' . $find .'<br>';

gives me

DO a custom leadership directory search here.
remember, output must be against that tablular ouptut thing.
your keys are here:
dude
do search
select2 is (3.32192793685 * i.relevance) AS score

ok, moving on.

prgrcmpny’s picture

OK, we are sane up to here.

here is the next block in the do_search function of the search.module that I need to grok.
I have already added a return to get the $condtions variable.

 // Second pass: only keep items that match the complicated keywords conditions (phrase search, negative keywords, ...)
  $conditions = '('. $query[0] .')';
	return $conditions;
  $arguments = array_merge($arguments2, $query[1]);
  $result = db_query_temporary("SELECT i.type, i.sid, $select2 FROM temp_search_sids i INNER JOIN {search_dataset} d ON i.sid = d.sid AND i.type = d.type $join2 WHERE $conditions $sort_parameters", $arguments, 'temp_search_results');
  if (($count = db_result(db_query('SELECT COUNT(*) FROM temp_search_results'))) == 0) {
    return array();
  }
  $count_query = "SELECT $count";

here is the output

DO a custom leadership directory search here.
remember, output must be against that tablular ouptut thing.
your keys are here:
dude
do search
conditions is (d.data LIKE '%% %s %%')

ok, some observations.

what table is 'd' and where is d.data.
it looks like the %%%s%% is escaping the % and filling in for quotes and for the % thingy for the d.data LIKE '%s' where s would presumably be 'dude', but lets hold off.

lets move on to the next stage.

prgrcmpny’s picture

progressing to the next variable.

 $arguments = array_merge($arguments2, $query[1]);
	return $arguments;

since arguments is an array, I modify my trapper code on ld_do_search

function ld_do_search(){
		print 'DO a custom leadership directory search here.<br>';
		print 'remember, output must be against that tablular ouptut thing.<br>';
		print 'your keys are here: <br>';
		print $_POST['edit']['keys'] .'<br>';
		print 'do search <br>';
		$find = 	do_search($_POST['edit']['keys'],'ld');
    print 'argument   is ' . $find .'<br>';

		foreach ($find as $item){
		 print 'item:'. $item.'<br>';
			foreach($item as $duh){
			 print 'duh:'. $duh.'<br>';
		  }
		}

and get

DO a custom leadership directory search here.
remember, output must be against that tablular ouptut thing.
your keys are here:
dude
do search
argument is Array
item:dude

so my argument is my keyword...

moving on...

prgrcmpny’s picture

next line is

$result = db_query_temporary("SELECT i.type, i.sid, $select2 FROM temp_search_sids i INNER JOIN {search_dataset} d ON i.sid = d.sid AND i.type = d.type $join2 WHERE $conditions $sort_parameters", $arguments, 'temp_search_results');

this selects into a different temp table, so I create another hack table for storing the result..

CREATE TABLE `ld_hack_2` (
  `type` varchar(64),
  `sid` int(10) unsigned,
  `score` float
) 

and modify the sql

  $result = db_query("INSERT INTO ld_hack_2 SELECT i.type, i.sid, $select2 FROM temp_search_sids i INNER JOIN {search_dataset} d ON i.sid = d.sid AND i.type = d.type $join2 WHERE $conditions $sort_parameters", $arguments);

  $result = db_query_temporary("SELECT i.type, i.sid, $select2 FROM temp_search_sids i INNER JOIN {search_dataset} d ON i.sid = d.sid AND i.type = d.type $join2 WHERE $conditions $sort_parameters", $arguments, 'temp_search_results');

with results

mysql> select * from ld_hack;
+------+------+-----------+---------+
| type | sid | relevance | matches |
+------+------+-----------+---------+
| ld | 120 | 0.30103 | 1 |
+------+------+-----------+---------+
1 row in set (0.05 sec)

mysql> select * from ld_hack_2;
+------+------+-------+
| type | sid | score |
+------+------+-------+
| ld | 120 | 1 |
| ld | 120 | 1 |
+------+------+-------+
2 rows in set (0.04 sec)

ok, we are getting data to date.

next section

prgrcmpny’s picture

I am assuming that the temp_search_results table matches my ld_hack_2 table.

the next line in the search.module will check that assumption.

if (($count = db_result(db_query('SELECT COUNT(*) FROM temp_search_results'))) == 0) {
			return 'zero count in temp_search_results';
    return array();
  }


I modify my ld_do_search code to trap a string result and we get...

DO a custom leadership directory search here.
remember, output must be against that tablular ouptut thing.
your keys are here:
dude
do search
count is Array

so the temp table had data. next section.

prgrcmpny’s picture

what table is d? is in the sql.

Its a good thing I syced up the types in search_dataset and search_index or the query would have bombed.

prgrcmpny’s picture

the final section of the search.module do_search function reads...

 $count_query = "SELECT $count";

  // Do actual search query
  $result = pager_query("SELECT * FROM temp_search_results", 10, 0, $count_query);
  $results = array();
  while ($item = db_fetch_object($result)) {
    $results[] = $item;
  }
  return $results;

now lets see what to do next.

lets go a step at a time.

lets look at teh pager_query arguments to understand what we are seeing.
from the api docs

Parameters

$query The SQL query that needs paging.

$limit The number of query results to display per page.

$element An optional integer to distinguish between multiple pagers on one page.

$count_query An SQL query used to count matching records.

... A variable number of arguments which are substituted into the query (and the count query) using printf() syntax. Instead of a variable number of query arguments, you may also pass a single array containing the query arguments.

so, this seems to be saying, select from the temp_search_results table limit to 10 results per page with a select count as the max number in the temp_search_results table.

lets see if I get any results by sticking a return statement in the WHILE loop

  while ($item = db_fetch_object($result)) {
			return 'in the while loop';
    $results[] = $item;
  }

I modify my trapping code

function ld_do_search(){
		print 'DO a custom leadership directory search here.<br>';
		print 'remember, output must be against that tablular ouptut thing.<br>';
		print 'your keys are here: <br>';
		print $_POST['edit']['keys'] .'<br>';
		print 'do search <br>';
		$find = 	do_search($_POST['edit']['keys'],'ld');
    print 'while loop entered?   is ' . $find .'<br>';

and I get...

DO a custom leadership directory search here.
remember, output must be against that tablular ouptut thing.
your keys are here:
dude
do search
while loop entered? is in the while loop

prgrcmpny’s picture

Ok.

here is the final line in the search.module do_search function.

 return $results;

because of this discussion, we know that $results is a array of arrays from the db hit against temp_search_results which looks like my ld_hack_2 table

mysql> describe ld_hack_2;
+-------+------------------+------+-----+---------+-------+
| Field | Type             | Null | Key | Default | Extra |
+-------+------------------+------+-----+---------+-------+
| type  | varchar(64)      | YES  |     | NULL    |       |
| sid   | int(10) unsigned | YES  |     | NULL    |       |
| score | float            | YES  |     | NULL    |       |
+-------+------------------+------+-----+---------+-------+
3 rows in set (0.07 sec)

mysql> select * from ld_hack_2
    -> ;
+------+------+-------+
| type | sid  | score |
+------+------+-------+
| ld   |  120 |     1 |
| ld   |  120 |     1 |
+------+------+-------+

so, this is getting really cool.

We now know that the do search function will return an array of *pointers* to node content of my custom type that I can now use to go get hit the db with and return my custom output.

This is a good stage to go grab a salami sandwich, but first, some random thoughts.

the node.module, where it invokes do_search has the following code...

 foreach ($find as $item) {
        $node = node_load($item->sid);

        // Get node output (filtered and with module-specific fields).
        if (node_hook($node, 'view')) {
          node_invoke($node, 'view', false, false);
        }
        else {
          $node = node_prepare($node, false);
        }
        // Allow modules to change $node->body before viewing.
        node_invoke_nodeapi($node, 'view', false, false);

        // Fetch comments for snippet
        $node->body .= module_invoke('comment', 'nodeapi', $node, 'update index');
        // Fetch terms for snippet
        $node->body .= module_invoke('taxonomy', 'nodeapi', $node, 'update index');

        $extra = node_invoke_nodeapi($node, 'search result');
        $results[] = array('link' => url('node/'. $item->sid),
                           'type' => node_get_name($node),
                           'title' => $node->title,
                           'user' => theme('username', $node),
                           'date' => $node->changed,
                           'node' => $node,
                           'extra' => $extra,
                           'snippet' => search_excerpt($keys, $node->body));
      }

ignoring the nodeapi calls, we see that there are no other db calls involved (unless the node_load function has one...lets check that.

well, well, it does here is the code

  // Retrieve the node.
  // No db_rewrite_sql is applied so as to get complete indexing for search.
  if ($revision) {
    array_unshift($arguments, $revision);
    $node = db_fetch_object(db_query('SELECT n.nid, r.vid, n.type, n.status, n.created, n.changed, n.comment, n.promote, n.moderate, n.sticky, r.timestamp AS revision_timestamp, r.title, r.body, r.teaser, r.log, r.format, u.uid, u.name, u.picture, u.data FROM {node} n INNER JOIN {users} u ON u.uid = n.uid INNER JOIN {node_revisions} r ON r.nid = n.nid AND r.vid = %d WHERE '. $cond, $arguments));
  }
  else {
    $node = db_fetch_object(db_query('SELECT n.nid, n.vid, n.type, n.status, n.created, n.changed, n.comment, n.promote, n.moderate, n.sticky, r.timestamp AS revision_timestamp, r.title, r.body, r.teaser, r.log, r.format, u.uid, u.name, u.picture, u.data FROM {node} n INNER JOIN {users} u ON u.uid = n.uid INNER JOIN {node_revisions} r ON r.vid = n.vid WHERE '. $cond, $arguments));
  }

and that, my friends is the answer to this problem.

I will invoke the do_search function to get a list of pointers to nodes of my custom type.

I will then semi-duplicate the node_load function with something I think up after lunch.

During that node_load semi_duplication, I will do my db calls with the pager_query to get my output .

I am feeling better.

I will summarize in the next comment. after my salami sandwich.

prgrcmpny’s picture

ok, I *know* that I will be getting an arrray of arrays that contain data like in my ld_hack_2 table.

mysql> select * from ld_hack_2;
+------+------+-------+
| type | sid  | score |
+------+------+-------+
| ld   |  120 |     1 |
| ld   |  120 |     1 |
+------+------+-------+
2 rows in set (0.04 sec)

now, my php array handling skills really stink, so let me scrounge up a way to get the data from there and verify my result...

ok, don't panic, lets try this.

function ld_do_search(){
		print 'DO a custom leadership directory search here.<br>';
		print 'remember, output must be against that tablular ouptut thing.<br>';
		print 'your keys are here: <br>';
		print $_POST['edit']['keys'] .'<br>';
		print 'do search <br>';
		$find = 	do_search($_POST['edit']['keys'],'ld');
    print 'looping throught result set ' . $find .'<br>';
		foreach ($find as $item){
		 print 'item:'. $item.'<br>';
			foreach($item as $duh){
			 print 'duh:'. $duh.'<br>';
		  }
		}

I get the following result

DO a custom leadership directory search here.
remember, output must be against that tablular ouptut thing.
your keys are here:
dude
do search
looping throught result set Array
item:Object
duh:ld
duh:120
duh:0.999999999999307
item:Object
duh:ld
duh:120
duh:0.999999999999307

so, I my assumption is correct, I have an array of arrays containing the type,sid (aka nid) and relevance or score for custom node types containing the search term 'dude'

DUUUUUDE!

thats a starting point for the next section...getting my search results. displayed on a sortable table output thingy.

prgrcmpny’s picture

Ok, lets take a breath and recap where we are...

  • we create a custom search page utilizing the search.module's search_form() function with arguments telling it to only search for nodes of a certain type.
  • the form redirects to a callback page where I do a call to do_search on the search.module
  • the do_search returns an array of *pointers* to relevant nodes of the correct type that I can
    no use to go extract my custom content.

so, to continue, I need to

  • write a private function to pull back the information I want to display
  • write the sql that does exactly that
  • call page_query ?? to get my data returned in a format I want
  • output themed results

One final thing to keep in mind....

  • I am not sure that my nodes stored with type = 'ld' on an ld_insert or if I ran a custom sql update for development purposes. I may need to revisit this after I get the results going
  • if not, ld_update_index or ld_nodeapi('update index') could be utilized to git r done
prgrcmpny’s picture

I am at home, now drinking a few brews reading my post and noticing I can't spell for dirt.

sorry for any confusion that may result.

t.

prgrcmpny’s picture

ok, lets clean up the ld_do_search page to do a call to build some sql or something....

hmmm, let me think about this.

ok, I am not going to give away the structure of my client's database, but here is the structure of the ld_s

function ld_do_search(){
  $strSQL = '';

  $find = 	do_search($_POST['edit']['keys'],'ld');
  $strSQL = _build_ld_search_results_sql($find);
 
  print $strSQL;
}

function _build_ld_search_results_sql($arg){
                //for a bad arg return sql that always returns an empty set
		if(!$arg)
		  return 'SELECT * FROM node WHERE 2 = 1';  
  $inclause = '(';
  foreach ($arg as $item) 
    $inclause .= $item->sid . ",";

  $inclause = trim($inclause,',');
  $inclause .=  ')';
	
  $strSQL = 'SELECT FIELDS YOU WANT TO DISPLAY '.
	     			' FROM relevant tables             '.
			      ' WHERE nid IN '. $inclause ;
	
  return $strSQL;
	
}

gives me the following result

SELECT FIELDS YOU WANT TO DISPLAY FROM relevant tables WHERE nid IN (120,120)

I am NOT going to worry about the duplicate node ids at this point, I can clean that up later.
I am going to build my sql and figure out the pager_query stuff or whatever I need to display a sortable, columnized table.

I may have to pick up posting tomorrow because the sql may be a bit tricky to develop.

thanks for listening !

this has helped me, I hope it helps somebody out there.

tullnet’s picture

I think this is what I've been looking for! Keep up the good work!

prgrcmpny’s picture

I still have an issues with multiple keys in the search, but I am able to display *some* search results to my client.

I will be focusing on the presentation side for a bit, but will come back to this thread, soon.

thanks again for the compliment.

t.

prgrcmpny’s picture

Short answer, my bad sql/data in a development db with some crufty data.

Here is how I found out.

My custom search hook starts out looking like this...

function ld_do_search($arg = NULL){
  $strSQL = '';
  $s;
  if(!$arg)
    $find = do_search($_POST['edit']['keys'],'ld');  
  else
    $find = do_search($arg,'ld');
  
  $strSQL = _ld_build_search_results_sql($find,'IN');
  $result = pager_query(db_rewrite_sql($strSQL),1);
  $s .= '<table> ';

My _ld_build_search_results _sql() fcn takes the keys,etc returned from the do_search function and returns sql that will return my customer's data.

On a hunch, I wanted to verify that the $find array only had one element, so I modified my code as follows:

function ld_do_search($arg = NULL){
  $strSQL = '';
  $s;
  if(!$arg)
    $find = do_search($_POST['edit']['keys'],'ld'); 
  else
    $find = do_search($arg,'ld');

 foreach ($find as $key){
     print 'key:'. $key.'<br>';
       foreach($key as $value){
         print 'value:'. $value.'<br>';
       }
    }
  $strSQL = _ld_build_search_results_sql($find,'IN');
  $result = pager_query(db_rewrite_sql($strSQL),1);
  $s .= '<table> .....';

and I got.

key:Object
value:ld
value:121
value:1.00000000000053
key:Object
value:ld
value:122
value:1.00000000000053

Which tells me the core drupal search functionality is working exclusivelly against my custom node type (WOO HOO!) and that the problem is with crufty data in my custom tables for my custom node type.

The data portion of this project has been solved.

now onto the presentation.

prgrcmpny’s picture

In order for my new custom nodes content to be indexed by the search engine, I implement hook_update_index.

The pseudo-code for this is basically.

  • get a list of all nodes of my custom type that have not been indexed
  • retrieve all the data I want indexed
  • Wrap that data in tags with important terms wrapped in header tags per the docs in search.module
  • invoke the search_index hook on the search.module to get my terms indexed
function ld_update_index() {
 $q = "SELECT nid FROM {node} WHERE type = 'ld' AND nid NOT IN (SELECT sid FROM {search_index} WHERE type = 'ld')";  
	
$r = db_query($q);
while ($outer_rs = db_fetch_object($r)){
  $strSQL = _ld_build_search_update_index_sql($outer_rs->nid);
   $result = db_query($strSQL);
   while($rs = db_fetch_object($result)){
     $s = _ld_build_search_update_index_html($rs);
	if('' == $s)
	  continue;
	else
	  search_index($rs->nid, 'ld', $s);
    }
  }
}

It works.

prgrcmpny’s picture

I have to debug an issue, here.

I have a couple of issues going on to resolve.

Here is one of them.

When I run a search against one term in one node (that has been indexed, correctly) I get correct results.

When I use two terms spanning indexed nodes (my custom node types) I get no results.

	if(!$arg){
	 $find = do_search($_POST['edit']['keys'],'ld');
	 print 'not arg, keys are:';
  	 print $_POST['edit']['keys'] .'<br>';
        foreach ($find as $item){
          print 'item:'. $item.'<br>';
          foreach($item as $duh){
            print 'duh:'. $duh.'<br>';
      }

When I search for one value, I get

not arg, keys are:programming
item:Object
duh:ld
duh:149
duh:1.00000000000008
SELECT l....below above code, I print out the final search sql...

which shows a $find array as discussed earlier.

However with two terms...

I get..

not arg, keys are:exper programming
SELECT 1 FROM leadership_directory WHERE 1 = 2

where the $find array is null and the select statement is arg checking code that returns sql that returns no results.

the big question is why is search.module.do_search no finding the two keys which I have visually identifies as being indexed in the search.index...

mysql> select * from search_index where type='ld' and word in ('exper','programming');
+-------------+-----+------+---------+----------+-------+
| word        | sid | type | fromsid | fromtype | score |
+-------------+-----+------+---------+----------+-------+
| exper       | 157 | ld   |       0 | NULL     |    19 |
| programming | 149 | ld   |       0 | NULL     |    26 |
+-------------+-----+------+---------+----------+-------+
2 rows in set (0.04 sec)

I gotta run for a few hours, but will update as I debug.

t

prgrcmpny’s picture

Ok,

In my earlier work, in this comment: http://drupal.org/node/124963#comment-207924 on this topic, I glossed over the $query=search_parse_query($keywords) function.

Here I pick it up.

First my 'trapper code' in the ld_do_search fcn

function ld_do_search($arg = NULL){
  $strSQL = '';
  $s;

	if(!$arg){
		$find = do_search($_POST['edit']['keys'],'ld');   //do_search is a fcn  in search.module that returns sids and scores
		print '<br>if <br>keys are:';
  	print $_POST['edit']['keys'] .'<br>';
    foreach ($find as $item){
      print 'item:'. $item.'<br>';
      foreach($item as $duh){
        print 'duh:'. $duh.'<br>';
      }
    }			
	}else{
  	$find = do_search($arg,'ld');
	  print 'else <br>';
    foreach ($find as $item){
      print 'item:'. $item.'<br>';
      foreach($item as $duh){
        print 'duh:'. $duh.'<br>';
      }
    }					

Here is I have written some debugging code in the search modules do_search function...

function do_search($keywords, $type,...yada yada yada.... DESC') {
  $query = search_parse_query($keywords);
	print '<br>query is: ' .$query;
	foreach($query as $item){
  	print '<br>item->'.$item;
  	foreach($item as $value)
  	  print '<br>value->'.$value;
  }		

Here is the output:

query is: Array
item->d.data LIKE '%% %s %%' AND d.data LIKE '%% %s %%'
item->Array
value->exper
value->programming
item->i.word = '%s' OR i.word = '%s'
item->Array
value->exper
value->programming
item->2
if
keys are:exper programming

the next portion of the search.module.do_search is

 if ($query[2] == '') {
    form_set_error('keys', t('You must include at least one positive keywo........
  if ($query === NULL || $query[0] == '' || $query[2] == '') {
    return array();
  }

we see that $query[2] contains keywords (I will check it anyways in next comment.

I don't think the other if will happen either, but lets check it.

on to the next post.

prgrcmpny’s picture

As stated in previous comment, I am verifying the search.module.do_search function does not kick me out here:

if ($query[2] == '') {
   print '<br> not hunkey';
    form_set_error('keys', t('You must includ........, 3))));
  }
  if ($query === NULL || $query[0] == '' || $query[2] == '') {
   print '<br> not dorey';
    return array();
  }

and the output is.

query is: Array
item->d.data LIKE '%% %s %%' AND d.data LIKE '%% %s %%'
item->Array
value->exper
value->programming
item->i.word = '%s' OR i.word = '%s'
item->Array
value->exper
value->programming
item->2
if
keys are:exper programming

we see that niether 'not hunky' nor 'not dory' printed so ipso-facto things are hunky-dory!

continuing....

prgrcmpny’s picture

This is getting reminiscent of my earlier posts, so I am going to speed it up a but...

Next up are the $conditions, $arguments variables, the ld_hack temp table and the $normalize variable as discussed in previous comments on this thread.

here is the code in search.module->do_search with my print statements in their to track it.

 $conditions = $where1 .' AND ('. $query[2] .") AND i.type = '%s'";
  print '<br>$conditions =' . $conditions;
  
$arguments = array_merge($arguments1, $query[3], array($type, $query[4]));
  print '<br>$arguments =' . $arguments;
 foreach($arguments as $item)
   print '<br>arguments->'.$item;
	
	
  $result = db_query("INSERT INTO ld_hack SELECT i.type, i.sid, SUM(i.score * t.count) AS relevance, COUNT(*) AS matches FROM {search_index} i INNER JOIN {search_total} t ON i.word = t.word $join1 WHERE $conditions GROUP BY i.type, i.sid HAVING COUNT(*) >= %d", $arguments);
  $result = db_query_temporary("SELECT i.type, i.sid, SUM(i.score * t.count) AS relevance, COUNT(*) AS matches FROM {search_index} i INNER JOIN {search_total} t ON i.word = t.word $join1 WHERE $conditions GROUP BY i.type, i.sid HAVING COUNT(*) >= %d", $arguments, 'temp_search_sids');

  // Calculate maximum relevance, to normalize it
  $normalize = db_result(db_query('SELECT MAX(relevance) FROM temp_search_sids'));
$normalize = db_result(db_query('SELECT MAX(relevance) FROM ld_hack'));
  print '<br>$normalize =' . $normalize;
	
  if (!$normalize) {
     print '<br>things are not normal'
    return array();
  }
	

here is the output:

query is: Array
item->d.data LIKE '%% %s %%' AND d.data LIKE '%% %s %%'
item->Array
value->exper
value->programming
item->i.word = '%s' OR i.word = '%s'
item->Array
value->exper
value->programming
item->2
$conditions =1 AND (i.word = '%s' OR i.word = '%s') AND i.type = '%s'
$arguments =Array
arguments->exper
arguments->programming
arguments->ld
arguments->2
$normalize =
things are not normal
if
keys are:exper programming

and here is the rows in ld_hack

mysql> select * from ld_hack;
Empty set (0.06 sec)

So we see that an empty array is being returned because the ld_hack table is not being populated.

that points to the SQL.

onword.....

prgrcmpny’s picture

Here is the ld_hack insert sql (with brackets removed from table names)

INSERT INTO ld_hack 
  SELECT i.type
         ,i.sid
         ,SUM(i.score * t.count) AS relevance
         ,COUNT(*) AS matches 
   FROM search_index i INNER JOIN search_total t ON i.word = t.word 
   $join1 
 WHERE $conditions GROUP BY i.type, i.sid HAVING COUNT(*) >= %d", $arguments);

I first check that the inner join 'should' work with mysql prompt.

mysql> select * from search_total where word in ('exper','programming');
+-------------+-----------+
| word        | count     |
+-------------+-----------+
| exper       | 0.0222764 |
| programming | 0.0163904 |
+-------------+-----------+
2 rows in set (0.05 sec)

mysql> select * from search_index where  word in ('exper','programming');
+-------------+-----+------+---------+----------+-------+
| word        | sid | type | fromsid | fromtype | score |
+-------------+-----+------+---------+----------+-------+
| exper       | 157 | ld   |       0 | NULL     |    19 |
| programming | 149 | ld   |       0 | NULL     |    26 |
+-------------+-----+------+---------+----------+-------+

so that join should work..

hmmmj,

lets see if I can trap join1 for any wierdness

prgrcmpny’s picture

I really think the culprit is that 'HAVING COUNT(*) >= %d' part, but lets check join1 for completeness.

the print code is:

foreach($arguments as $item)
  	print '<br>arguments->'.$item;

	print '<br>join1 is:' .$join1;
      $result = db_query("INSERT INTO ld_hack...

result is:

query is: Array
item->d.data LIKE '%% %s %%' AND d.data LIKE '%% %s %%'
item->Array
value->exper
value->programming
item->i.word = '%s' OR i.word = '%s'
item->Array
value->exper
value->programming
item->2
$arguments =Array
arguments->exper
arguments->programming
arguments->ld
arguments->2
join1 is:
$normalize =
things are not normal
if
keys are:exper programming

so the $join1 is probably and empty string and not effecting the SQL.

back to the SQL.

prgrcmpny’s picture

Here is the resultant SQL when the $conditions and $arguments variables are factored into the db_query

INSERT INTO ld_hack 
  SELECT i.type
         ,i.sid
         ,SUM(i.score * t.count) AS relevance
         ,COUNT(*) AS matches 
   FROM search_index i INNER JOIN search_total t ON i.word = t.word 
   WHERE 1 AND (i.word = 'exper' OR i.word = 'programming') AND i.type = 'ld'  GROUP BY i.type, i.sid   HAVING COUNT(*) >=2 

running just the SELECT portion of this gives me:

mysql>  SELECT i.type
    ->          ,i.sid
    ->          ,SUM(i.score * t.count) AS relevance
    ->          ,COUNT(*) AS matches 
    ->    FROM search_index i INNER JOIN search_total t ON i.word = t.word 
    ->    WHERE 1 AND (i.word = 'exper' OR i.word = 'programming') AND i.type = 'ld'  GROUP BY i.type, i.sid HAVING COUNT(*) >= 2;
Empty set (0.08 sec)

I change the 2 to a 1 in the HAVING COUNT..section like this:



running just the SELECT portion of this gives me:

<code>
mysql>  SELECT i.type
    ->          ,i.sid
    ->          ,SUM(i.score * t.count) AS relevance
    ->          ,COUNT(*) AS matches 
    ->    FROM search_index i INNER JOIN search_total t ON i.word = t.word 
    ->    WHERE 1 AND (i.word = 'exper' OR i.word = 'programming') AND i.type = 'ld'  GROUP BY i.type, i.sid HAVING COUNT(*) >= 1;
Empty set (0.08 sec)

and I get

mysql>  SELECT i.type
    ->          ,i.sid
    ->          ,SUM(i.score * t.count) AS relevance
    ->          ,COUNT(*) AS matches 
    ->    FROM search_index i INNER JOIN search_total t ON i.word = t.word 
    ->    WHERE 1 AND (i.word = 'exper' OR i.word = 'programming') AND i.type = 'ld'  GROUP BY i.type, i.sid HAVING COUNT(*) >= 1;
+------+-----+-------------------+---------+
| type | sid | relevance         | matches |
+------+-----+-------------------+---------+
| ld   | 149 |  0.42615083605051 |       1 |
| ld   | 157 | 0.423251487314701 |       1 |
+------+-----+-------------------+---------+
2 rows in set (0.04 sec)

So, the question becomes...how does that 2 get into the $arguments array variable?

$arguments =Array
arguments->exper
arguments->programming
arguments->ld
arguments->2k

Can I influence it?
Is it a built in part of drupal?

on to grokking the arguments array....

prgrcmpny’s picture

Ok,

$arguments array in search.module->do_search function is defined initially, by this code.

 $arguments = array_merge($arguments1, $query[3], array($type, $query[4]));

the array_merge function http://www.php.net/manual/en/function.array-merge.php
"usually" appends the elements of arrays to each other and returns the
resulting array, looking at the arguments above I can surmise that $arguments is built from...

$arguments=array_merge(array(),('exper','programming'),('ld',2))

which does in fact look like the output from my dump code:

$arguments =Array
arguments->exper
arguments->programming
arguments->ld
arguments->2

the culprit is therefore that $query[4] value.

and it got there via the very first line in the search_module.do_search function:

$query = search_parse_query($keywords);

therefore. onto search_parse_query($keywords);

but tomorrow! I am beat.

prgrcmpny’s picture

First, lets I make sure the $keywords argument to the call is valid..

  print '<br>Keywords are:'.$keywords;
  $query = search_parse_query($keywords);

and they are good

Keywords are:exper programming

Next I look at the search.module->search_parse_query(....) function and it is a BIG HONKING function.

So, I cut to the chase and look at the return statement:

 return array($query, $arguments, $query2, $arguments2, $matches);

and then I add print statements for each of the arguments to it, before the return...

        print '<br>query-> '. $query;
	print '<br>arguments-> '.$arguments;
	print '<br>query2-> '.$query2;
	print '<br>arguments2-> '.$arguments2;
	print '<br>matches-> '.$matches;
  return array($query, $arguments, $query2, $arguments2, $matches);

and I get:

query-> d.data LIKE '%% %s %%' AND d.data LIKE '%% %s %%'
arguments-> Array
query2-> i.word = '%s' OR i.word = '%s'
arguments2-> Array
matches-> 2

so, it looks like $matches is the culprit, but first lets check arguments1 and 2 for completeness in next comment...

prgrcmpny’s picture

First, I modify my print code as follows:

	print '<br>$arguments =' . $arguments;
	foreach($arguments as $arg)
  	print '<br>arguments->'.$arg;

	print '<br>$arguments2 =' . $arguments2;
	foreach($arguments2 as $arg2)
  	print '<br>arguments2->'.$arg2;

and the output is:

$arguments =Array
arguments->exper
arguments->programming
$arguments2 =Array
arguments2->exper
arguments2->programming

hmmm, based on past drupalness in functions, I had expected arguments2 to be null, but both arguments and arguments2 contain the same values.

Maybe that has an influence on $matches.

However, $matches is the culprit.

so lets grok that...

prgrcmpny’s picture

First, I modify my print code as follows:

	print '<br>$arguments =' . $arguments;
	foreach($arguments as $arg)
  	print '<br>arguments->'.$arg;

	print '<br>$arguments2 =' . $arguments2;
	foreach($arguments2 as $arg2)
  	print '<br>arguments2->'.$arg2;

and the output is:

$arguments =Array
arguments->exper
arguments->programming
$arguments2 =Array
arguments2->exper
arguments2->programming

hmmm, based on past drupalness in functions, I had expected arguments2 to be null, but both arguments and arguments2 contain the same values.

Maybe that has an influence on $matches.

However, $matches is the culprit.

so lets grok that...

prgrcmpny’s picture

$matches is in the heart of the function search_parse_query($text) {

function and I am beat.

I will pick it up in the a.m.

later.

prgrcmpny’s picture

The search.module->do_search function has developer comments that state:

// First pass: select all possible matching sids, doing a simple index-based OR matching on the keywords.
// 'matches' is used to reject those items that cannot possibly match the query.

so I am probably dealing with a design decision that states that only those terms with more than one occurance will show up....

I add the below code to manually hack this and set $matches to 1


  // First pass: select all possible matching sids, doing a simple index-based OR .....
  // 'matches' is used to reject those items that cannot possibly match the query.
  $conditions = $where1 .' AND ('. $query[2] .") AND i.type = '%s'";

  /////////////////////////////PICK IT UP HERE IN THE A.M. $match is 2 killing the having count.....
	print '<br> type is '.$type;
	
	if('ld' == trim($type))  //blatant hack...(:
			$query[4] = 1;
	print '<br> query4 is '.$query[4];
	
  $arguments = array_merge($arguments1, $query[3], array($type, $query[4]));
  $result = db_query("INSERT INTO ld_hack.....));

  $normalize = db_result(db_query('SELECT MAX(relevance) FROM ld_hack'));
  ...
  print '<br>$normalize =' . $normalize;
	
  if (!$normalize) {
		 print '<br>things are not normal';
    return array();
  }
  	

and the result is...

type is ld
query4 is 1
$normalize =0.42615082859993
if
keys are:exper programming

Ok, the HAVING COUNT>=2 only influenced putting data into the temp table and my ld_hack table, so I check that...

mysql> select  * from ld_hack;
+------+------+-----------+---------+
| type | sid  | relevance | matches |
+------+------+-----------+---------+
| ld   |  149 |  0.426151 |       1 |
| ld   |  157 |  0.423251 |       1 |
+------+------+-----------+---------+
2 rows in set (0.04 sec)

mysql> select max(relevance) from ld_hack;
+------------------+
| max(relevance)   |
+------------------+
| 0.42615082859993 |
+------------------+
1 row in set (0.04 sec)


which is good.

EXCEPT the search returns no results, so the problem is lower in the code.

hmmm.

prgrcmpny’s picture

Looking lower in the code, I am revisiting my ld_hack_2 stuff from early in this adventure...

Here is the next area of code:

 $select2 = str_replace('i.relevance', '('. (1.0 / $normalize) .' * i.relevance)', $select2);

  $conditions = '('. $query[0] .')';
  $arguments = array_merge($arguments2, $query[1]);
  $result = db_query("INSERT INTO ld_hack_2 SELECT i.type, i.sid, $select2 FROM temp_search_sids       $result = db_query_temporary("SELECT i.type, i.sid, $select2 FROM temp_search_sids ........
  if (($count = db_result(db_query('SELECT COUNT(*) FROM temp_search_results'))) == 0) {
    return array();
  }

I run it and check the ld_hack_2 table.

mysql> select * from ld_hack_2;
Empty set (0.05 sec)
mysql> 

lets verify that temp_search_results is empty as well and returning an empty array..

if (($count = db_result(db_query('SELECT COUNT(*) FROM temp_search_results'))) == 0) {
	print '<br> $count is:' . $count;
	return array();
  }
type is ld
query4 is 1
$normalize =0.42615082859993
$count is:0
if
keys are:exper programming

ok, so it is pointing to the next sql

prgrcmpny’s picture

Ok,

here is the insert sql

 INSERT INTO ld_hack_2 
     SELECT i.type
           ,i.sid, 
           $select2 
 FROM temp_search_sids i 
INNER JOIN search_dataset d ON i.sid = d.sid 
        AND i.type = d.type 
$join2 
WHERE $conditions $sort_parameters", $arguments;

ok, we see that temp_search_sids (equivalent to my ld_hack table) is joined against the search_dataset table on sids...lets check that first.

mysql> select sid,type from search_dataset where sid in (select sid from ld_hack);
+-----+------+
| sid | type |
+-----+------+
| 149 | ld   |
| 157 | ld   |
+-----+------+
2 rows in set (0.04 sec)

so the join is not the issue.

Lets check the variables using print statements...

 $result = db_query_temporary("SELECT i.type, i.sid, $......, $arguments, 'temp_search_results');

print '<br> $select2->' . $select2;
print '<br> $join2->'. $join2;
print '<br> $conditions->'. $conditions;
print '<br> $sort_parameters->'. $sort_parameters;
print '<br> $$arguments->'. $arguments;
	foreach($arguments as $item)
 	print '<br>arguments->'.$item;

if (($count = db_result(db_query('SELECT COUNT(*) FROM temp_search_results'))) == 0) {
			return array();

and the result is.

type is ld
query4 is 1
$normalize =0.42615082859993
$select2->(2.34658701307 * i.relevance) AS score
$join2->
$conditions->(d.data LIKE '%% %s %%' AND d.data LIKE '%% %s %%')
$sort_parameters->ORDER BY score DESC
$$arguments->Array
arguments->exper
arguments->programming
if
keys are:exper programming

hoo-boy....ok, let me analyze this.

mapping out the arguments above, I get the following db_query...

 INSERT INTO ld_hack_2 
     SELECT i.type
           ,i.sid 
           ,(2.34658701307 * i.relevance) AS score
      FROM temp_search_sids i 
INNER JOIN search_dataset d ON i.sid = d.sid 
        AND i.type = d.type 
WHERE (d.data LIKE '%% %s %%' AND d.data LIKE '%% %s %%') ORDER BY score DESC", ('exper','programming');

which translated into a mysql call is

INSERT INTO ld_hack_2 
     SELECT i.type
           ,i.sid
           ,(2.34658701307 * i.relevance) AS score
      FROM ld_hack i 
INNER JOIN search_dataset d ON i.sid = d.sid 
        AND i.type = d.type 
WHERE (d.data LIKE '%exper%' AND d.data LIKE '%programming%') ORDER BY score DESC;

where I replace the temp_ table with my ld_hack.

I will run this in the next comment and take it apart

prgrcmpny’s picture

While I was waiting for the last comment to post, I ran the following sql...

mysql>      SELECT i.type
    ->            ,i.sid
    ->            ,(2.34658701307 * i.relevance) AS score
    ->       FROM ld_hack i 
    -> INNER JOIN search_dataset d ON i.sid = d.sid 
    ->         AND i.type = d.type 
    -> WHERE (d.data LIKE '%exper%' AND d.data LIKE '%programming%') ORDER BY score DESC;
Empty set (0.05 sec)

I cut to the chase and looked at the d.data like clause and ran

mysql> select sid,type from search_dataset d WHERE (d.data LIKE '%exper%' AND d.data LIKE '%programming%');
Empty set (0.12 sec)

ok, the problem is in the search_dataset.data field not containing the terms.

I verify the sids are there., first

mysql> select * from ld_hack;
+------+------+-----------+---------+
| type | sid  | relevance | matches |
+------+------+-----------+---------+
| ld   |  149 |  0.426151 |       1 |
| ld   |  157 |  0.423251 |       1 |
+------+------+-----------+---------+
2 rows in set (0.04 sec)

mysql> select sid,type from search_dataset where sid in (149,157);
+-----+------+
| sid | type |
+-----+------+
| 149 | ld   |
| 157 | ld   |
+-----+------+
2 rows in set (0.05 sec)

ok, definitely the in clause..

I look in the data fields for these nodes...

mysql> select sid,mysql> select sid,type,data from search_dataset where sid in (149,157)\G

and the terms are there....

*************************** 1. row ***************************
 sid: 149
type: ld
data:  database programming drupal configuration........ 
*************************** 2. row ***************************
 sid: 157
type: ld
data:  exp exper 0 .................. 
2 rows in set (0.05 sec)

crap.

I try the following...

mysql> select sid,type from search_dataset d WHERE (d.data LIKE 'exper%' AND d.data LIKE 'programming%');
Empty set (0.04 sec)

mysql> select sid,type from search_dataset d WHERE (d.data LIKE '%exper' AND d.data LIKE '%programming');
Empty set (0.06 sec)

mysql> select sid,type from search_dataset d WHERE (d.data LIKE '%%exper%%' AND d.data LIKE '%%programming%%');
Empty set (0.05 sec)

mysql> select sid,type from search_dataset d WHERE (d.data LIKE '%% exper %%' AND d.data LIKE '%% programming %%');
Empty set (0.06 sec)


double crap.

ok, I think I am really going to have to hack the search.module

but first, I am going to give this a bit of distance....

I will get back after I calm down.

t.

prgrcmpny’s picture

The search is working until trying to join using like against the search_dataset.data field.

That field is populated by my module in the ld_update_index hook.

Where I will look next is in how I am populating it--maybe I need to look at things differently there.

back tomorrow.

t.

prgrcmpny’s picture

single search terms work and I am assuming it uses the same join.....

I will trace the sql tomorrow using a single search term to maybe get a clue.

t.

prgrcmpny’s picture

Chatted with a co-worker and he suggested I change the AND to an OR

select sid,type from search_dataset d WHERE (d.data LIKE '%exper%' OR d.data LIKE '%programming%');
+-----+------+
| sid | type |
+-----+------+
|  57 | node |
|  82 | node |
|  87 | node |
|  32 | node |
| 116 | node |
| 103 | node |
| 118 | node |
| 134 | ld   |
+-----+------+


Well, that rules out me tagging the data incorrectly in the ld_update_index() function. If it was mis-tagged, it would not show up with the OR flag either.

I now have a valid reason for "letting go" of this topic for now. That AND is part of the core search routine and I will not be hacking it for several reasons.

I will summarize in my last ? post on this subject for a while.

prgrcmpny’s picture

A coworker of mine, hit on the solution to the indexing problem.

Simply move the code from hook_update_index to hook_nodeapi in the 'update index' case.
That solved the indexing problem and now the search works correctly (:

I don't have time to delve into the 'why' of it as I have a deadline, but the fix is there and I am confident enough to use custom searches going forward.

here is the code from the nodeapi 'update index' call:

			if( $node->type == "ld" ) {
				$text = "";
				$query = "Select ld.field1, ld.field2,....ld.fieldN " .
					"FROM node n LEFT JOIN ld_table ld ON n.nid=ld.nid Where n.nid=%d";
				$result = db_query( $query, $node->nid );
				if( $rs = db_fetch_object( $result ) ) {
					$text = $rs->field1 . " " . $rs->field2;
				}
				return $text;
			}

and it worked.

all the best,

t.

kris.dover’s picture

Hello there fellow,
stumbled across your posting while looking for some answers to that question myself. But damn is your posting long. I will go through it and see what you have come up with but i just wanted to first quickly share how i approached that problem which i doubt was the cleanest solution.

i had a custom vacancy node and i wanted an extrinsic search much as you did. I created a modification of drupal v4.7's do_search() which does not produce a paged result. instead it returns an array of all the matching node ids strictly for my custom node and simply include those as part of an sql 'nodeid in (, , ...)' predicate in my custom query. after that it's just a matter of putting the results into a table on my custom page.

this solution works fine but i would prefer one that does not involve re-implementing an existing drupal function. to be honest i haven't really search for other user contributions prior to reading the drupal source and coming up with that solution so there might very well be a better way. anyhow hope this helps.

Kris Dover
Unemployed Drupal User

prgrcmpny’s picture

Hi Kris,

My preference is/was the same as yours--leverage existing modules.

What kicks me now, is that I can clearly see indexed words for my custom modules in the search_index table and in the search_dataset table, but I don't get results for my search.

At the end of the day, I hope to have a coder's knowledge of the do's and don'ts for drupal searching.

btw, my technique of long posts is just a running commentary, so I can ramp up to the problem again when I get pulled away by other tasks. It really helps me!

Best of luck.

t.

designwork’s picture

Hi prgrcmpny,

I have a simelar problem with the search.module. But i´m trying now the tripsearch.module for it, it looks better for me to make a SQL search.

http://drupal.org/project/trip_search

Maybe that will help you.

Dirk

prgrcmpny’s picture

Thanks for the link Dirk,

I will definitely consider it in the future.

Cordially,

p.

designwork’s picture

Hi P,

at the moment i´m doing a advanced profileplus module for better user search. If you are intrested i can give you th code tomorrow.
(I hope i will finish it untill Tomorrow)

Dirk

prgrcmpny’s picture

Thanks for the offer, Dirk, but I am on a project deadline.

regards,

p.

designwork’s picture

Hi p.

me to and i´m becoming crazy of Drupal...........

Nothing works without changing the modules. But maybe my clients want´s stuff that nobodyhas until now. The usersearch is working now But i still have some problems with multiple keywords.

Dirk

prgrcmpny’s picture

our solution for the multiple keyword search problem was moving the indexing from the hook_update_index to the hook_nodeapi('update index') case.

bonniebantam’s picture

What version of Drupal are you developing it for?

douggreen’s picture

Looks like you had a long running conversation with yourself, that I admin, I didn't get completely through. I wanted to point you to views and the views_fastsearch modules. You can create a view with filters for both content types and exposed search terms. The search filter that comes with views handles the search.module query buiding. The views_fastsearch filter adds an alternative filter that creates faster SQL (requires mysql 4.1).

It seems that you've played with search.module extensibly, and I'd like to invite you to the discussion in the newly formed search working group. I'm trying to optimize the search.module queries to not use temporary tables. Have you had any success with that?

Doug Green

prgrcmpny’s picture

Hi,

Thanks for the invite and the pointers to views_fastsearch.

I have subscribed to the group and will be a fly on the wall for a bit.

hmmmm, optimizing the search.module queries to not use temp tables....I haven't done that but let me think about it.

cordially,

t.

xaler’s picture

As I am new to Drupal I hope I posted in the right section.

I use a search to gather text ads.

This is info on the search module:

; $Id: search.info,v 1.3 2006/11/21 20:55:35 dries Exp $
name = Search
description = Enables site-wide keyword searching.
package = Core - optional
version = VERSION

; Information added by drupal.org packaging script on 2007-01-30
version = "5.1"
project = "drupal"
________________________________

And next to this I use the ad module:
; $Id: ad.info,v 1.1.2.1 2007/02/24 13:03:56 morbus Exp $
name = Ad
package = Ad
description = An advertising system for Drupal powered websites.

; Information added by drupal.org packaging script on 2007-06-06
version = "5.x-1.1-rc3"
project = "ad"
datestamp = "1181154302"
_____________________________________

The problem I have is that when I search for keywords, to much details are exposed.

Search results are presented like this:

[ Link to external website ]
[ Aanhangwagens | Betanco ] Betanco trailers - voor al uw trailers en Aanhangwagens Links to http://www.betanco.nl/ . Status ...

Advertisement - fabian - 05/30/2007 - 14:02 - 0 reacties - 0 bijlagen

I want to hide / remove all info from 'Links to http:// .....' to the end 'bijlagen = attachements'

When I click a link on a search page, it jumps to the 'expanded ad' (no shorter version) but it also shows the 'Link to and STATUS info'.

Please would anyone help me remove / hide these details?

Thank you very much!!