yielding one search result page from two distinct datasets [#631836]

SUMMARY:
we have two data sets which we would like to index and search from the same search bar, and have results from each be returned in the final result set. it feels like i should be able to do this, but i'm not sure where the best place to hack in is, if it's even possible.

DETAIL:
- first data set is, of course, Drupal data (using nid as index reference).
- second set is data from an external database, not at all tied to Drupal's nid's. it is a catalog of files for which we want to create links to description/download pages.

so, if we search for "trike" in drupal's search bar, we might get back a couple nodes of content with the work "trike" in them, as well as a result with a link describing a file with "trike" in the description of the file.

WHERE I'M AT SO FAR:
I've successfully used DataImportHandler to bring in data from the external database. SOLR itself seems to allow strings for id's, so I have used DIH/schema.xml to pre-pend "a" to each id that comes in from the external data set. this ensures each external id won't overwrite the internal nid's in Drupal's main index. i also arrange the external data into title, content, teaser, etc so that the records otherwise fit with Drupal's existing schema. from the JAVA DIH webapp, i'm able to successfully search for "trike" and get results both from Drupal and from the external data set. but when I go to the apachesolr_search box and search for "trike", only the Drupal results are printed to the screen.

IDEAS:
i think at some point in either the Drupal apachesolr_search module or the Drupal Solr Query Interface (or somewhere else?), the schema is changed from SOLR's, "an id is a string" to Drupal's, "an id is an int" and thus id's with alpha characters in them are discarded (for instance, id == "a101"). perhaps there's a way to hack in and override this without breaking existing functionality? my hope is, if i can identify where Drupal is reading the id, I can hack in and change those id's to a single, designated nid that i use as a dummy node. i can't do this at the SOLR indexing level, obviously, because it would just keep writing over/updating that id in the index as it built the index.

any ideas? thanks in advance!

Comments

Comment #1

pwolanin commented 14 November 2009 at 03:15

You might try looking at http://drupal.org/project/distributed_search or maybe http://drupal.org/project/adjustisearch

Comment #2

mausolos commented 16 November 2009 at 21:44

I took a look at these, thank you. The distributed search requires openid stuff, which is something we're explicitly avoiding for the time being. Adjusti-search seems great if I had an entire secondary engine that was returning results of its own, but what I actually have is a distinct, secondary index which I'm essentially trying to interpolate with the primary index. That said, I could probably figure out something to handle search requests anyway, but if anyone has any other ideas, I'd love to hear them.

Thank you!

Comment #3

mausolos commented 2 December 2009 at 07:32

http://drupal.org/node/635556 is related, I think...

Comment #4

Grayside commented 19 March 2010 at 06:41

Is there any further guidance on this concept? I was hoping to use Solr to index several different (not all Drupal) sites and create a master search tool in Drupal.

Comment #5

jpmckinney commented 7 May 2010 at 18:38

Status:

Active

» Postponed (maintainer needs more info)

Instead of prepending "a", why not prepend "999999"? That would satisfy this module's expectation that id's are int's.

Comment #6

jpmckinney commented 7 May 2010 at 18:49

Marking http://drupal.org/node/610892 duplicate.

Comment #7

scott.whittaker commented 23 November 2010 at 23:36

Any success with this method?

Comment #8

pwolanin commented 19 December 2010 at 23:53

Status:

Postponed (maintainer needs more info)

» Fixed

The concept of site hash used by the apachesolr_multisitesearch module is perfectly applicable to non-Drupal site. I suggest you look at that code.

Comment #9

3 January 2011 at 00:00

Status:

Fixed

» Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

Comment #10

dpalmer commented 2 December 2011 at 17:12

mausolos, were you successful in your objective? I am trying to do the same thing now...

Comment #11

mausolos commented 5 December 2011 at 21:37

not so much.. i believe i got as far as discovering that i could do it on the solr side of things by prepending each result id from the foreign dataset with, say, an alpha character; this would allow id 1 native to the foreign database to coexist as id a1 next to drupal nid 1. i was looking at search_api for D7 just last night, however, and it looks like it might have some ways of doing this that are much better (though I haven't carefully examined them yet, since that project is long behind me).

unfortunately, given these tools, we concluded that the most realistic ways of doing this without rewriting the solr search module were one of:
a) to simply designate an "address range" for drupal nid's. so our sql server, which is where the separate catalog existed, would get sql id + 1000000 or something, and we'd just pray that we never got a million nodes in drupal (or wherever our cutoff was).
b) give up and just import the catalog data into drupal nodes, updated every-so-often via rotation of table data (so update the table data on a non-live database, then transpose it within mysql from the dummy db to the live db, then trigger the index update stuff)
c) write a module whose sole purpose is to somehow trick the indexer into thinking its reading drupal nodes, when in fact it's reading an xml stream that's being fed via a stored procedure from the sql server, and masking the data as drupal nodes (obviously, this was just an idea, it's probably insane and might not even work).

best of luck, i'd love to hear what you end up doing! :)

Comment #12

Raul Cano commented 6 August 2015 at 07:52

Hi mausolos,
Maybe it's a bit too late but did you actually achieve something in D6? I am running the same case here (see my comment https://www.drupal.org/node/635556#comment-10190268), but it looks like there is not a simple solution for Drupal 6...

yielding one search result page from two distinct datasets

Comments

Comment #1

Comment #2

Comment #3

Comment #4

Comment #5

Comment #6

Comment #7

Comment #8

Comment #9

Comment #10

Comment #11

Comment #12

News items

Our community

Documentation

Drupal code base

Governance of community