Download & Extend

Port Apache Solr Multisite Search to Drupal 7

Project:Apache Solr Multisite Search
Version:6.x-1.x-dev
Component:Code
Category:task
Priority:normal
Assigned:Unassigned
Status:closed (fixed)
Issue tags:d7 ports, Issue summary initiative

Issue Summary

To support Drupal 6.x-3.x and Drupal 7.x-1.x

What is needed :

  • use the site and hash information in solr to facet on but also to filter queries (deletions, selects if needed)
  • Add metadata for D6 and D7
    • Content types
    • Bias information
    • Site Hash (already included from the module)
    • Site Url (already included from the module)
    • Vocabulary names
  • Add a facet on the hash with output the name of the site
  • Modify the bias pages to include bias for content from the other sites.

I propose that we open up a new branch for D7 and for D6 and we start developing.
Let's do this in the same line as apachesolr. 7.x-1.x for the Drupal 7 version and 6.x-3.x for the Drupal 6 version.

Comments

#1

Title:Upgrade to Drupal 7» Upgrade Apache Solr Multisite Search to Drupal 7

Actually, this title will make my community initiatives page make more sense. ;)

#2

Title:Upgrade Apache Solr Multisite Search to Drupal 7» Port Apache Solr Multisite Search to Drupal 7

One more. I'm really done now. :P

#3

subscribe

#4

cool , subscribe
and drupal.org already use it in d7core, how come ?

#5

Upgrade path will be 6.x-1.x -> 6.x-2.x -> 7.x-1.x

#6

Category:feature request» task

#7

Noting that apachesolr 7.x now has significant schema changes.

#8

Hi,
Anyone working on this , any progress ?

#9

sub

#10

sub

#11

Subscribe

#12

Sub

#13

subscribe

#14

I'm taking this on, expect an initial patch soon.

#15

Status:active» needs review

Here's the first basic version. Everything works except for the blocks/facets. Any help getting this last bit fixed with the new Facet API would be very much appreciated.

(Go to admin/config/search/settings and make sure the checkbox for Multisite is checked. Possibly clear cache afterwards.)

AttachmentSize
apachesolr_multisitesearch_d7-1006994.patch 35.11 KB

#16

I'm actually curious if we need to add any other filter than "Filter by site". Filters provided by apachesolr (like "Filter by content type") also work in this D7 multisite version.

#17

Here's a new patch. All it needs now is the "Filter by site" facet, and probably cleaning up some legacy code afterwards.

AttachmentSize
apachesolr_multisitesearch_d7-1006994.patch 31.75 KB

#18

Thanks for the patches.

re: $document->entity_id = 1;, seems like instead it should be the hash?

Or maybe we should make that a non-required field?

#19

We should discuss the architecture - I had though that we might actually merge this into the main module, depending on what's left after we remove the facet code.

#20

Making entity_id a non-required field seems like a good idea. Also, entity_id is type long, so it can't be the hash.

Making the multisite functionality part of the main module makes sense to me, we're still using a lot of semi-duplicate code anyway. What's the best way to discuss this?

#21

adding d7 ports tag

#22

@wmostrey - I hope to have a better handle on this architecture by late next week. Maybe we can have a call on Sept 23? I'll look at changing the schema in advance of that.

#23

Let's do that, great!

#24

An updated version, with all d6 facet code removed and a clean settings page. Tested with both Drupal sites using the apachesolr module and non-Drupal sites crawled with Nutch.

AttachmentSize
apachesolr_multisitesearch_d7-1006994.patch 40.3 KB

#25

#26

Here are the instructions: http://drupal.org/node/666606/git-instructions/6.x-1.x

In short:

1. Setting up repository for the first time

git clone --branch 6.x-1.x http://git.drupal.org/project/apachesolr_multisitesearch.git
cd apachesolr_multisitesearch

2. Applying a patch
Download the patch to your working directory. Apply the patch with the following command:
git apply -v [patchname.patch]

#27

#28

Can you create the 7 branch with this patch?
That way it is easier to get the module.

#29

The patch will most likely need to pass review first before Peter Wolanin creates the branch. So if you want to help move this forward: review the patch and get the status to RTBC. Thanks!

#30

I now also added a site/hash facet so you can now again filter the search results per site.

AttachmentSize
apachesolr_multisitesearch_d7-1006994.patch 40.9 KB

#31

looks like a good start, especially if it's moving toward faceapi integration.

We'll need to figure out how to expose the appropriate multi-site facets there, however.

#32

I had a issue with the site metadata, when go to the "Multisite seetings" section then under the "Delete data from sites using this index" section I only can see two sites (total subsites are >10). My question is how to get the correct information from all subsites on the list?

Thanks a lot!

#33

Since the schema has change a bit to include an entity_id (instead of entity) you need to index each individual subsite again using this module. That should fix your problem.

#34

Note that we don't yet have a 6 version of apachsolr compatible with the 7 version. That will be the 6.x-3.x branch.

#35

Subscribe.. As always reminding folks about using http://drupal.org/project/coder to look over patches and help review them before release.

#36

Subscribe

#37

Maybe I will create the branch if this is a generally working basis for progress.

#38

For 7.x I would like to figure out how to meld the multisite search functionality with the search environments concept we added in 7.x apachesolr module, as well as with the custom search pages.

I feel like we should be able to make this module even smaller, since I always work to have the support for multi-site search pretty well baked into the main module.

#39

I agree. I'll see what I can do to integrate the concepts of apachesolr_multisite into the apachesolr modules.

#40

Ok, I may take a crack at it this weekend myself.

#41

Status:needs review» active

note, patch above I committed to a new branch, so setting back to active

#42

#43

Also, I think we should potentially remove the use of the core search hooks (especially for the search page), and just leverage the user defined search pages.

#44

#45

Status:active» needs work

Starting to reduce this down to the essence.

AttachmentSize
1006994-45.patch 13.81 KB

#46

Hi Guys,

I have a question here and I really appreciate your help! I have a drupal7 with multisite setup and solr multisite search module to do the durpal multisite search. Now I have another non drupal (simple html) site running on another web server. My question is how to search cross the many drupal sites and the non-drupal site and get the results from ALL sites?

Thanks a lot!

#47

@synbaxp - off topic. This issue is about the code update for 7.x.

Open a separate support request or try IRC.

#48

I've been talking through this with Nick Veenhof. I did some testing with the latest Apache Solr dev module, and since it now also takes the hash into account, every page is actually ready to support multisites. I believe we need the following functions in Apache Solr to get it working:

  • apachesolr_multisitesearch_facetapi_facet_info() to create the site/hash facet; we could add the metadata settings to the facet configuration.
  • apachesolr_multisitesearch_apachesolr_query_alter() with an option per Search Page to enable or disable the addFilter

We might need to work out the details as to what configuration goes where, but this will bring us a long way.

Your patch is good to go, except that the function should be apachesolr_multisitesearch_facetapi_facet_info() and not apachesolr_multisitesearch_facet_info().

#49

Status:needs work» needs review

If the complete module is replaced with this code it is already working between different Drupal 6 and 7 sites. What is left is to make node access integration work between Drupal 6 and 7.

<?php
/**
* @file
* Extends Apache Solr Search module to provide multisite support.
* This includes
* 1) A facet that allows filtering per site
* 2) changes the links so they redirect to the approriate site
*
*/

/**
* Implements hook_facetapi_facet_info().
*
* @param type $searcher_info
* @return type
*/
function  apachesolr_multisitesearch_facetapi_facet_info($searcher_info) {
 
$facets = array();
 
$facets['site'] = array(
   
'field' => 'site',
   
'label' => t('Site Name'),
    
'description' => t('Filter by Site Name'),
   );
  return
$facets;
}

/**
* Make sure that the links in our search results link to the website of origin
*/
function  apachesolr_multisitesearch_apachesolr_process_results(&$results, DrupalSolrQueryInterface $query) {
  foreach (
$results as $id => $result) {
   
$results[$id]['link'] = $results[$id]['fields']['url'];
  }
}
?>

#50

I propose a much bigger change and make it easier for all of us to build it from scratch again

AttachmentSize
1006994-50.patch 23.46 KB

#51

Fixing the right package so it shows up in search toolkit now

AttachmentSize
1006994-51.patch 23.9 KB

#52

Some namespace issues. I think this one should be good to go in and let's follow up with other functionality later on? What do you think?

AttachmentSize
1006994-52.patch 23.9 KB

#53

The patch in #52 is good to go. I would already prefer to see a dev release based on this patch to continue working on.

#54

I pinged pwolanin to take a look at this issue. Afaik he will do that asap.

#55

I reckon this might benefit from a good issue summary too...

#56

Status:needs review» needs work

Trying to figure out all the deletions

function apachesolr_multisitesearch_map_hash() becomes a no-op? You removed hook_facetapi_facet_info()?

We certainly still need the hook_apachesolr_query_alter(), but it should be looking to a per-envirnoment setting.

Also, all the metadata functionality seems to be removed. I'm not sure what's going on - is this the right patch?

#57

To support Drupal 6.x-3.x and Drupal 7.x-1.x

What is needed :

  • use the site and hash information in solr to facet on but also to filter queries (deletions, selects if needed)
  • Add metadata for D6 and D7
    • Content types
    • Bias information
    • Site Hash (already included from the module)
    • Site Url (already included from the module)
    • Vocabulary names
  • Add a facet on the hash with output the name of the site
  • Modify the bias pages to include bias for content from the other sites.

I propose that we open up a new branch for D7 and for D6 and we start developing.
Let's do this in the same line as apachesolr. 7.x-1.x for the Drupal 7 version and 6.x-3.x for the Drupal 6 version.

(added this to the opening post)

#58

Status:needs work» needs review

These patches should include the metadata + the corrected hash to sitename mapping that comes from the metadata.

as I mentioned before I would prefer if those were added to the 7.x-1.x branch and the 6.x should be added to a new branch 6.x-3.x

I've tested these patches on a 6.x site and on a 7.x site and multisite between 6.x and 7.x is working perfectly. The regular module takes care of indexing fields with their machine name so a D6 and a D7 site can easily create a facet that is using content from both.

I also added the content types/bundles to the meta information but I'd like to have some more input how we could handle bias information for content types/bundles that are not part of the site where the search was executed

AttachmentSize
1006994-58-drupal6.patch 36.58 KB
1006994-58-drupal7.patch 26.8 KB

#59

  1. Merged in the suggestions of pwolanin from #45
  2. You have to explicitly mention you want a specific environment to be multisite capable. If not, you won't see multisite search results. This settings was added to the environment configuration

The patch for 6 was diffed with the current 6, the patch for 7 was diffed with the current 7

Would this be a good starting point for all?

AttachmentSize
1006994-59-drupal6.patch 39.51 KB
1006994-59-drupal7.patch 28.24 KB

#60

Forgot to remove a dsm...

AttachmentSize
1006994-60-drupal7.patch 28.22 KB

#61

Let's change this:

$document->entity_type = 'multisite_meta';

and use a string that cannot be a valid Drupal entity type.

e.g. 'multisite.meta' or 'multisite/meta' or 'multisite-meta'

You moved a bunch of functionality like apachesolr_multisitesearch_generate_metadata() into the .module instead of leaving it in the admin.inc. If it's not used on most page loads, I think better to keep in the .inc file?

#62

How should we go about testing this? I ran into several issues so I might be doing something wrong.

I tried this with apachesolr 7.x HEAD and 3 sites sharing the same solr core. I enabled multisite support for the solr server and cleared the index and reindexed every site. I ran into these issues:

  • All search results share the same website facet. The facet is always equal to the site you're currently on.
  • $results[$id]['fields']['hash'] doesn't exist (line 77 in apachesolr_multisitesearch.module
  • When using the same facets but for entities that don't exist on the current site the raw value of the facet is displayed. For example, if a node from another site is attached a term with tid 6 that is only available on the other site there is a facet "6".

#63

@pwolanin : I moved it because I felt some of this code did not belong in an admin.inc. The code that is used is not only for the admin pages but could be used as an API (crud of the metadata) for those that need it.
I only included functions in the admin file that are directly related to the admin configuration. Which one do you want to move to the admin.inc?
Patch attached with multisite.meta as entity type

@klaasvw
This is still very much a work in progress. I suggest that you try to find the broken part, correct it and upload the patch. Does that work for you?

AttachmentSize
1006994-63-drupal7.patch 28.22 KB
1006994-63-drupal6.patch 39.42 KB

#64

Status:needs review» needs work

Although I'm not that familiar with the module, here's a quick Dreditor scan! This is off drupal7.patch. I really like the LOC I/D ratio :-) .

+++ b/apachesolr_multisitesearch.infoundefined
@@ -2,5 +2,5 @@ name = Apache Solr Multisite Search
-package = Apache Solr
+package = Search Toolkit

Should it be "Apache Solr Search Toolkit" instead? It's usually good to namespace module names.

+++ b/apachesolr_multisitesearch.moduleundefined
@@ -23,133 +23,89 @@ function apachesolr_multisitesearch_menu() {
+  return $data;
+}
+
+function apachesolr_multisitesearch_apachesolr_process_results(&$results, DrupalSolrQueryInterface $query) {
+  $env_id = $query->solr('getId');

Might like some doc block for apachesolr_multisitesearch_apachesolr_process_results().

+++ b/apachesolr_multisitesearch.moduleundefined
@@ -23,133 +23,89 @@ function apachesolr_multisitesearch_menu() {
+ *
+ * @param string $query
+ *   Defaults to *:*
  */
-function apachesolr_multisitesearch_cron() {
-  apachesolr_multisitesearch_refresh_metadata();
+function hook_apachesolr_delete_by_query_alter($query) {
+  // use the site hash so that you only delete this site's content
+  if ($query == '*:*') {
+    $query = 'hash:' . apachesolr_site_hash();

Is this suppose to be "hook_apachesolr_delete_by_query_alter()"? Maybe we should move that to apachesolr_multisitesearch.api.php instead?

#65

@Nick - I was using admin.inc as a generic include file, despite the name.

#66

@rob loach - That hook is clearly wrong, should be fixed indeed. The search toolkit is a general package name so this module will appear in the same list as apachesolr and its derivatives.

@pwolanin, Are you ok with moving them to an apachesolr_multisite.index.inc (similar to apachesolr?). We could even call it meta.inc or something similar.

#67

@Nick - have a index.inc file is fine as you like it - I was just lazy when I wrote it and found it easier to have just one .inc file to look in.

#68

Status:needs work» needs review

This patch should have an index.inc + the fix with the delete hook. Tested out most of the functionality with a D6 and D7 site. Also the D6 and the D7 module are now very similar when compared to eachother so I'll include a small diff of that also

ignore this one

AttachmentSize
1006994-68-drupal7.patch 31.68 KB
1006994-68-drupal6.patch 36.85 KB

#69

This patch should have an apachesolr_multisitesearch.index.inc + the fix with the delete hook. Tested out most of the functionality with a D6 and D7 site. Also the D6 and the D7 module are now very similar when compared to each other so I'll include a small diff of that to show the differences.

AttachmentSize
1006994-69-drupal6.patch 41.13 KB
1006994-69-drupal7.patch 31.11 KB
1006994-69-diff_drupal7-drupal6.patch 3.58 KB

#70

Looks better. I think we still need to e.g. alter the author facet for a multisite environment, but that can be a follow-up.

@Nick - I added your commit access if you want to get these patches into git.

#71

Status:needs review» fixed

Commited to 7.x-1.x

#72

Created a branch 6.x-3.x and applied the patch for the 6.x-3.x branch

#73

Oh, rock!!! Thanks so much guys! :D

#74

The cool thing is that you can now do a multisite between D6 and D7 sites ;-) Still a work in progress though!

#75

Status:fixed» closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.