Add a site field into the schema, and allow users to search over more then site, or just filter by site: XXX

Comments

hansBKK@drupal.org’s picture

I just want my site visitors to be able to do a simple search over my entire domain, including site1.example.com, site2.example.com as well as example.com itself of course.

Is this possible now?

What if one or more of the subdomains aren't running Drupal?

I don't want to wait for/depend on Google. Any and all suggestions welcome.

JacobSingh’s picture

Can you elaborate on how you see this working for multiple drupal sites - that is from a UI point of view? The solr integration provided will *only* work with drupal. Solr itself could be building its DB from multiple sources, however, the schema provided is pretty druapl centric. Check out a project like Nutch, it might be what you are looking for, else pony up the dough for a google mini and use my google_appliance module.

Best,
J

aaron1234nz’s picture

If you had several Drupal sites, you could set them all to point to the same solr instance, then all of the sites would end up in one great big index. You could add the site: search functionality by creating a small module which adds an extra field to the solr index:

e.g.

function apachesolr_multisite_apachesolr_update_index(&$document, $node) {
  global $base_url;
  $document->site = $base_url;
}

You would also then need to add an extra field to solr's schema.xml file (about line 260)

<field name="site" type="string" indexed="true" stored="true"/>

however, I think there is still a problem because the nid is a unique key in the solr index, with several sites pointing to the same index all the nid's will get mixed up.

To get around this you would need to create a new unique id. To do this modify the above code to be something like this:

function apachesolr_multisite_apachesolr_update_index(&$document, $node) {
  global $base_url;
  $document->site = $base_url;
  $document->site_nid = $base_url . $node->nid;
}

and then add the following lines to the schema.xml file:

<field name="site" type="string" indexed="true" stored="true"/>
<field name="site_nid" type="string" indexed="true" stored="true"/>

and change the line in the schema.xml that reads:

<uniqueKey>nid</uniqueKey>

to this:

<uniqueKey>site_nid</uniqueKey>

After making an change to the schema.xml file, you need to restart solr for it to take effect.

nb: this is just my thoughts on the problem. I have not tested this code, but in theory it should work.

hansBKK@drupal.org’s picture

I'm afraid I haven't investigated the module enough to make suggestions at that level. I'm a relative noob starting to build out a domain with multiple sites and trying to figure out how to allow for domain-wide searching without depending on a third-party service. I will check out Nutch and Google mini, thanks for the suggestions.

chrisyates’s picture

StatusFileSize
new1.23 KB

Aaron,

I was able to get multisite searching to work using your code, with some additional tweaks in order to display links and purge deleted nodes.

Now that it's working, I'm going to see if I can make this cleaner, but here's the quick-and-dirty:

In the update_index function of apachesolr.module I added the site and site_nid fields to the $fields array:

$fields = array('title', 'body', 'type', 'uid', 'changed', 'nid', 'comment_count', 'name', 'site', 'site_nid');

And I changed the apachesolr_nodeapi function delete case to reflect that the unique_id is now the base_url + nid:

global $base_url;
$solr->deleteById($base_url . $node->nid);

And to return the correct url in the search results, I changed the results array of the apachesolr_search_search function like so:

$results[] = array('link' => url($doc->site.'/node/'. $doc->nid, NULL, NULL, TRUE),
                               'type' => node_get_types('name', $doc),
                               'title' => $doc->title,
                               'user' => theme('username', $doc),
                               'date' => $doc->changed,
                               'node' => $doc,
                               'extra' => $extra,
                               'score' => $doc->score,
                               'snippet' => $snippet);

Adding the $doc->site field before the node to direct the links to the correct source site.
-christian

JacobSingh’s picture

Hey guys!

This is a pretty cool patch and I'm sure will get some usage even if it never makes it into the main branch. Can you take a look at this issue:
http://drupal.org/node/296198 and give your comments on how we might better support federated search, taking input from your business experience in this area?

@RobertDouglass: This should also be added to an FAQ IMO

Thanks,
Jacob

robertdouglass’s picture

Category: task » feature

I love this feature. At the *least* it needs to be documented.

robertdouglass’s picture

Status: Active » Fixed
StatusFileSize
new46.68 KB

I've made a huge commit of my own version of multisite code. Here is the patch I committed, for reference. Needs backporting to D5.

robertdouglass’s picture

Status: Fixed » Patch (to be ported)
robertdouglass’s picture

Status: Patch (to be ported) » Fixed

Has been ported.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for two weeks with no activity.