Solr Multisite search

rj.seward - March 23, 2009 - 18:22
Project:Apache Solr Search Integration
Version:6.x-2.x-dev
Component:Multisite
Category:feature request
Priority:normal
Assigned:Unassigned
Status:needs work
Description

I have a rather interesting situation. On a Drupal multisite setup, a Solr search on the main_site shows results from a sub_site. If, for example a result is listed as node/18 in the results page, selecting this link takes me to main_site/node/18 even though the actual search result is at sub_site/node/18.

Just wondering if anyone else ever came across this behavior and how to tell Solr not to search subsites (such as in anything in the sites/ directory).

Thanks.

#1

JacobSingh - March 24, 2009 - 03:13
Status:active» won't fix

Multisite is not working.

I'm not sure how you managed to enable it, but it is not functional. If you want to work on it, be my guest, but please don't post any more support requests about it.

Thanks!
Jacob

#2

rj.seward - March 24, 2009 - 14:07

Jacob:

Thank you for your reply. Perhaps I should have perhaps phrased my request differently. I do not have Apache Solr multisite search module enabled and I am not attempting to get this working. I only have modules Apache Solr framework and Apache Solr search enabled.

What I have is a problem where Solr appears to be searching through different sites on a multisite setup but I DO NOT want it to do this. I only wish to search the one site on which I have enabled the Apache Solr module and configured Search to use Solr.

As it is, I am getting results back with links to another site in the multisite install, but when followed take me to the corresponding node in the first site. Here is a link to a search page for "apache" so you can see it firsthand: http://scis.wju.edu/drupal6/search/apachesolr_search/apache . The first link, for example, points to http://scis.wju.edu/drupal6/node/19 (which has nothing to do with Apache) whereas the teaser indicates that what it should be pointing to is http://scis.wju.edu/ralph/node/19, which is a page on a sub-site in the multisite install about setting up SVN on an Apache server. Or more correctly, these links shouldn't be showing up at all.

It looks like a bug, but I was not sure and wanted to post it to see if anyone else had a similar experience.

Thanks!
Ralph

#3

JacobSingh - March 25, 2009 - 06:34

Hi rj

When ApacheSolr indexes (on cron), it grabs nodes from the database of the site which is running. So there is no way it would include nodes from your other sites in your multisite setup unless you run cron against those sites and they have the same AS index settings.

Which it appears, you are doing:
http://scis.wju.edu/ralph/search/apachesolr_search/apache

Hope that helps,
Jacob

#4

rj.seward - March 25, 2009 - 17:12

Thanks again Jacob.

I disabled Solr module ion the subsite, then deleted and rebuilt the index on the main site. Now the problem appears to have been corrected.

So now, question: You wrote, "So there is no way it would include nodes from your other sites in your multisite setup unless you run cron against those sites and they have the same AS index settings." Could I theoretically have AS module enabled on more than one site on a Drupal install on a server? If so, which AS index settings would need to be changed? And where to change these?

Thanks for your help.

RJ

#5

JacobSingh - March 25, 2009 - 18:12

Are you using Acquia Search?

If so, you're out of luck until we implement Solr Multi-site. Or, you can have two subscriptions (one for each site).

If you have your own setup, look into Solr multicore. You'll need to configure the settings at admin/settings/apachesolr.

hth,
jacob

#6

kdes - May 3, 2009 - 19:04

Does this module now work with a multisite setup ? I have installed this module (assuming it worked for a multisite setup based on this http://drupal.org/node/322048#comment-1523900) and I experienced the same problem as described in the first post. Also, as per post #3 by JacobSingh the sites should not have the same "AS index settings" for results from another site not to be displayed on the current site. How do you go about doing this ? (Where do I change the "AS index settings", the settings on admin/settings/apachesolr would be the same for all sites so there's nothing I can change there what else needs to be changed ?).

In fact I actually want the results from other sites to be displayed which does happen right now but, when I click on the result it doesn't take me to correct page.

eg: If I search from mydomain.com and the page is actually on subdomain.mydomain.com/node/13 , instead of dierecting me to "subdomain.mydomain.com/node/13" it directs me to "mydomain.com/node/13".

#7

kdes - May 3, 2009 - 20:43
Status:won't fix» active

#8

robertDouglass - May 3, 2009 - 21:53
Status:active» fixed

Hi kdes - it's currently broken, as you've described. The framework for multisite to work is there, but a lot of parts have shifted around it and it needs repairing. If you have any resources to be used to help fix it I can help direct you. Otherwise you can wait until we get around to fixing it - which we will - but there's not guarantee about when.

#9

kdes - May 4, 2009 - 05:24
Status:fixed» active

I don't have any resources in terms of money or programing skills. But, I will try and look into this if you could direct me where. I guess the problem is that currently the url's are being stored as relative url's and for it to work with a multisite setup they need to be absolute url's.

#10

JacobSingh - May 4, 2009 - 06:16

@kdes: Unfortunately, it's more than that. Most of the work around multisite is making sure it will work with facets which are single site only (such as userids, etc), or at least differentiating between which are multisite safe, and which are not. But there are other issues as well.

#11

kdes - May 4, 2009 - 07:18

ok.. but faceted search is optional isn't it ? and userid's need not necessarily be different for multisites (I'm sharing users tables so user id's are same for all multistes). Anyway if we forget about the other issues for now, how can the current problem with url's not being stored correctly be resolved ?

I guess at the moment the nodes are being indexed only using nid and the base url of the current site is being used to display results but, the index needs to store the base url as well for each node.

#12

robertDouglass - May 4, 2009 - 13:22

@kdes - yes, you're right. It would be a step forward to get just the multisite searching working again with or without facets. I have to review what's changed since I wrote the multisite search code, but I think your analysis is correct. With the base URL and a node ID we should be able to fix this up.

One thing to look out for: if you run cron with www.example.com one time and example.com another, the base URL variable in Drupal might change. Therefore I suggest creating a new Solr specific variable that gets set either automatically or by the admin that can override the base url. This might be handy for other needs, to, like when you're building on dev.example.com and want to move to example.com later. Just a thought - probably needs refinement, but thought I'd point it out since it seems like you're interested in working on this issue.

#13

pwolanin - May 4, 2009 - 14:01

@robertDouglas - I though I added to the README the suggestion to set the $base_url. That's the basica solution to the proble, rather than creating an additional variable.

We are currently storing and retrieving the absolute url for each node, so I think a multi-site search without any facet support would work as easily as writing a module implementing the right _alter hook for search results and swapping the relative URL for the absolute one.

#14

kdes - May 6, 2009 - 06:58

@ pwolanin

could u please post a patch for this ?

#15

robertDouglass - May 6, 2009 - 10:11

@kdes - unfortunately it doesn't work like that. This is an issue that pwolanin will most likely work on, and most likely relatively soon. You can know this because Acquia is paying him to work on the module, and multisite is one of the features that we'd like to support. However, pwolanin and myself largely have to work on Acquia's schedule. If you need to speed things up (ie you can't wait for pwolanin or someone else to get to this issue) you need to submit the patch yourself or hire someone who can help. Sorry to disappoint.

#16

ronn abueg - May 21, 2009 - 23:15

I ran into the same problem with wrong node urls too. The issue is essentially that the search result is using the relative path instead of the absolute one, which is also available actually.

The username link is wrong too as the uid may not exist in the current searched site. To fix, you just have to take the base path from the node url/path and build the link manually instead of using the theme('username').

See the code below from apachesolr_search.module with the fix:

      $base_path = substr($doc->url, 0, strlen($doc->url) - strlen($doc->path));
      $results[] = array(
        'link' => $doc->url,
        'type' => apachesolr_search_get_type($doc->type),
        'title' => $doc->title,
        'user' => $doc->uid ? l($doc->name, "$base_path"."user/$doc->uid") : theme('username', $doc),
        'date' => $doc->created,
        'node' => $doc,
        'extra' => $extra,
        'score' => $doc->score,
        'snippet' => $snippet,
      );

I've attached a patch to alter apachesolr_search.module to fix the node and user link issue. Hope this helps.

AttachmentSize
apachesolr_search.patch 876 bytes

#17

Scott Reynolds - May 21, 2009 - 20:51

Why arn't we just using url('user/ID, array('absolute' => TRUE)); ?? That give you absolute paths

#18

ronn abueg - May 22, 2009 - 06:27

@Scott Reynolds - true but that would require changing apachesolr.index.inc and schema, clearing all indexes, etc. Its just a more complicated solution to fixing simple url issues. Although, the bigger issue still persists with faceted search and so on. To get that all working fine would require a lot of changes to the apachesolr modules.

#19

kdes - May 23, 2009 - 19:24

thanks ronn. applied ur patch and it works.

#20

pwolanin - May 24, 2009 - 23:23
Version:6.x-1.0-beta5» 6.x-1.x-dev

Good multi-site search is probably going to be a while yet in coming.

By design we don't use the $doc->url for single site search, since that is more fragile (e.g. if the indexing base url is not the same as the search base url).

#21

xarbot - June 2, 2009 - 14:12

i subscribed

#22

xarbot - June 4, 2009 - 14:00

This patch works for me, but it need to change the 1303 line of apachesolr.module

it has

$links[] = l($result->title, $result->path);

and i change to

$links[] = l($result->title, $result->url);

this needs for work the more like this block with absolute url's

Xarbot

#23

xarbot - June 12, 2009 - 06:25

hello. Now if I a need a search based only in one of this subsites (for example imagine that we have a general search and a locally search) how can i do this kind of search?? Maybe with a hidden value that says filter by url or something similar? Because i think that it must be an argument to the machine search engine in java, isn't it?

Thanks in advanced

Xarbot

#24

pwolanin - June 12, 2009 - 17:45

see 'hash' field in the schema

add to the query on the alter hook a filter:

$query->add_filter('hash', apachesolr_site_hash());

#25

xarbot - June 12, 2009 - 18:21

Sorry, but i can't find the hook filter, in which archive it is?

Thanks

Xarbot

#26

pwolanin - June 12, 2009 - 18:22

hook_apachesolr_modify_query(&$query, &$params)

#27

xarbot - June 12, 2009 - 18:47

Ok, it works!

If i have a bit of time this weekend i try to modify the form and the module for search in global site or local site and then with the patch in this thread the module could works in multisite environtment and could be searched in both modes (local and global)

Thanks in advanced!

Xarbot

#28

Onopoc - June 28, 2009 - 04:39

+1 for the Solr multisite search feature. Subscribing

#29

robertDouglass - July 17, 2009 - 10:41
Version:6.x-1.x-dev» 6.x-2.x-dev
Component:SolrPHP Client» Multisite
Category:support request» feature request
Status:active» needs work

Some good stuff here, including a patch. Moving to 6.2.

 
 

Drupal is a registered trademark of Dries Buytaert.