The way I read the cron hook of this module, it seems that you delete the entire search index every cron run, select the entire search index from each of the multi sites, and insert them back into the multisite search. Is this true?

Have you tried this with many multi sites and reasonably large indexes? I can't imagine that this will scale (feel free to argue otherwise!)

Also, the search index is not supposed to have distinct words in it so I think it is quite possible that this line is a bug:

  	// ??????  how to proceed with this in a better way 
  	// insert into search total -- need to have done without cron job 
  	$res3 = db_query("SELECT DISTINCT (word) FROM multisite_drupal_search_index");   

But I'll let you judge since I'm not wholly familiar with the way the module is supposed to work.

Comments

phdhiren’s picture

How to handle the deleted node's content in search result. May be because of that reason whole thing is being deleted.

techrobo’s picture

I hope this module works on single database multi site concept, and aggregates the search tables of all other sites on the base site.

In that case if a node is deleted on one site then it is probably not updating the search tables of the multisite module and hence that could be the reason whole table is truncated & rebuilt from search tables of other sites.

A work around could be to delete the rows from the search tables of multisite module whenever a node is deleted from the source site. Probably then you may not want to truncate & re-build the search tables all over again.

grawat’s picture

I'm trying to use this module on a multisite installation with about 220 other sites (they are all new and have very little content in them) and when cron runs, this module causes cron to exceed the time limit and then abort. I don't know if the module is being maintained but in its current form it's not going to work for large sites.

robertdouglass’s picture

grawat: you may be better off using ApacheSolr which also has a multisite search capability.

grawat’s picture

thanks.

jeff.cote’s picture

Category: support » feature

To reduce the amount of time that is taken to rebuild the tables, a number of changes can be made. First, you can copy over only the published nodes. Second, you can create a query to copy over all entries in one site at a time, instead of having a php loop that individual copies over one entry in a site at a time.

Also, there is a snippet that removes the custom '404 page not found' from the search results.

file: multisite_search.module
function: multisite_search_cron

original lines are:

  	// insert into search dataset 
  	$res1 = db_query("SELECT * FROM ".$tblpf."search_dataset");
  	while($result1 = db_fetch_array($res1)){ 
  		// insert into new multisite table 
  		db_query("INSERT INTO multisite_drupal_search_dataset (sid, type, data, subdmn_id) VALUES (%d, '%s', '%s', '%s')", $result1['sid'], $result1['type'], $result1['data'], $tblpf);
  	}
  	// insert into search index 
  	$res2 = db_query("SELECT * FROM ".$tblpf."search_index");
  	while($result2 = db_fetch_array($res2)){ 
  		// insert into new multisite table 
  		db_query("INSERT INTO multisite_drupal_search_index (word, sid, type, subdmn_id, fromsid, fromtype, fromsubdmn_id, score) VALUES ('%s', %d, '%s', '%s', '%s', %d, '%s', %f)", $result2['word'], $result2['sid'], $result2['type'], $tblpf, $result2['fromsid'], $result2['fromtype'], $tblpf, $result2['score']);
  	}

change lines to:

  // filter only published nodes
  $sql = " INNER JOIN ".$tblpf."node n ON s.sid = n.nid AND n.status=1";
  // exclude site 404 page
  $result0_nids = array( );
    $res0 = db_query("(SELECT value FROM ".$tblpf."variable WHERE name = 'site_404') UNION (SELECT value FROM ".$tblpf."i18n_variable WHERE name = 'site_404')");
    while($result0 = db_fetch_array($res0)){
    $result0_value = unserialize($result0['value']);
    if(substr($result0_value,0,5) == 'node/')
      $result0_nids[] = substr($result0_value,5);
  }
  if(count($result0_nids)>0)
    $sql .= " AND n.nid NOT IN (".implode(",",$result0_nids).")";
  // insert into search dataset
  db_query("INSERT INTO multisite_drupal_search_dataset (sid, type, data, subdmn_id) SELECT s.sid, n.type, s.data, '%s' FROM ".$tblpf."search_dataset s".$sql, $tblpf);
  // insert into search index 
  db_query("INSERT INTO multisite_drupal_search_index (word, sid, type, subdmn_id, fromsid, fromtype, fromsubdmn_id, score) SELECT s.word, s.sid, s.type, '%s', 0, 0, '%s', s.score FROM ".$tblpf."search_index s".$sql, $tblpf, $tblpf);
earthday47’s picture

Version: 6.x-1.1 » 6.x-2.x-dev

Interesting snippet... I'll look into it further and test.

I have to look closer at the code, but from my initial run-throughs, each site maintains its own search index, which is then aggregated upon running the search. This is of course, very inefficient, but the first question that came to my mind is, where should you run cron? On only one site? On any site?

However, you can share the 4 database tables among all the sites, which will prevent 200 indexes from appearing:

$db_prefix = array(
/* ..snip.. */
  'multisite_search_dataset' => 'shared_',
  'multisite_search_index'   => 'shared_',
  'multisite_search_sites'   => 'shared_',
  'multisite_search_total'   => 'shared_',
);

I don't know if it would solve the 200+ Multisite installation, but it's a start...

earthday47’s picture

Status: Active » Fixed

New version (6.x-2.0) has been committed!

I looked closely at jeff's code, and at the way core search.module works, and I don't think it's necessary to pull data from the node table. All the hook_cron() call is doing is copying the search_dataset table to multisite_search_dataset, and the published permissions, etc., are all handled by core Search.

I did take inspiration from #6 and remove the while() loops in favor of a INSERT INTO ... SELECT statement:

// Get all values from core search's search_dataset table and insert them into multisite dataset
db_query("INSERT INTO {multisite_search_dataset} SELECT sid, type, data, '" . $tblpf . "' AS subdmn_id FROM " . $tblpf . "search_dataset");
// Get all values from core search's search_index table and insert them into multisite index
db_query("INSERT INTO {multisite_search_index} (word, sid, type, subdmn_id, score) SELECT word, sid, type, '" . $tblpf . "' as subdmn_id, score FROM " . $tblpf . "search_index");

One query's better than 500!

Also, for comment #3, there is a new variable "TTL" that you can set on each site. It won't reindex on every cron run. Good practice might be to set the master site's TTL to 0, and then set the others to some high number, 10000. I haven't tested this extensively so we may want to revisit it.

What about a checkbox: "Re-index on cron"?

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.